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This document contains the following chapters: 


Introduction 

A general description of the TMS320C31, its key features, benefits, 
embeddec-controller requirements, compatible devices, and development 
support. 


TMS320C31 Architectural Overview 
Functional block diagram. TMS320C31 architecture description, hardware 
components, and device operation. Instruction set summary. 


TMS320C31 Features/Performance Comparison 
Comparison of TMS320C31 benchmark performance and feature values 
versus those of other embedded controllers. 


Application Examples 
Four application examples showing how the TMS320C30 and TMS320C31 
have been used for system-control functions in several application areas. 


Development Support 

Discussion of code-generation, debug, and system integration development 
flow. Summarizes features of Texas Instruments simulation and emulation 
development tools and describes available technical documentation and 
technical assistance. 


TMS320C31 Third-Party Support 

Alphabetical listing of third-party manufacturers and suppliers who provide 
development support products for the TMS320C31 and description of their 
products. 


TMS320 DSP Family 
Description of DSP market, Tl’s role in the DSP industry, TMS320 product 
roadmap, and the five generations of TMS320 devices. 


Part Ordering Information 
Listings of the hardware and software available from Texas Instruments to 
support the TMS320C31 device. 
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Style and Symbol Conventions 


Style and Symbol Conventions 


This document uses the following conventions. 


I 


Program listings, program examples, interactive displays, filenames, and 
symbol names are shown in a special typeface similar to a 
typewriter’s. Examples use a bold version of the special typeface for 
emphasis; interactive displays use a bold version of the special 
typeface to distinguish commands that you enter from items that the 
system displays (such as prompts, command output, error messages, 
etc.). 


Here is a sample program listing: 


0011 0005 0001 .field 1, 2 
0012 0005 0003 -field 3, 4 
0013 0005 0006 .field 6, 3 
0014 0006 .even 


Here is an example of a system prompt and a command that you might 
enter: 

C: esr -a /user/ti/simuboard/utilities 

Square brackets are also used as part of the pathname specification for 


VMS pathnames; in this case, the brackets are actually part of the 
pathname (they are not optional). 


In syntax descriptions, the instruction, command, or directive is in a bold 
typeface font and parameters are in an italic typeface. Portions of a syntax 
that are in bold should be entered as shown; portions of a syntax that are 
in italics describe the type of information that should be entered. Here is 
an example of a directive syntax: 


sasect section name”, address 


.asect is the directive. This directive has two parameters, indicated by 
section name and address. When you use .asect, the first parameter must 
be an actual section name, enclosed in double quotes; the second 
parameter must be an address. 


Two vertical bars (||) identify a parallel instruction. An instruction that is 
preceded by two vertical bars will be executed in parallel with the previous 
instruction in the assembly language source file. Here is an example of a 
parallel instruction: 


MPYI3  R7, R4, RO 
|| ADDIS *AR3,*AR5—(1),R3 
Since the ADDI3 is preceded with two vertical bars, the two lines of 


assembly language are considered a single instruction where both an 
integer multiply and integer add are performed. 


Style and Symbol Conventions 


_j An at character (@) preceding a label or expression in an instruction 
indicates that direct addressing is being performed and that the label or 
expression following the at character is used to form the data address. 
Here is an example: 


ADDI @OBCDEh, R7 


In this instruction, the data address is formed by concatenating OBCDEh 
with the current value of the data page pointer. The contents of this 
location is added to R7 and stored in R7. 


J Braces( {and} ) indicate a list. The symbol | (read as or) separates items 
within the list. Here’s an example of a list: 


{ * | k4 | x } 
This provides three choices: *, *+, or *-. 


Unless the list is enclosed in square brackets, you must choose one item 
from the list. 


(_} Some directives can have a varying number of parameters. For example, 
the .byte directive can have up to 100 parameters. The syntax for this 
directive is: 


-byte value; [, ..., value, ] 


This syntax shows that .byte must have at least one value parameter, but 
you have the option of supplying additional value parameters, separated 
by commas. 
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Related Documentation from Texas Instruments / Trademarks 


Related Documentation From Texas Instruments 


Trademarks 


vi 


TMS320C3x User’s Guide (literature number SPRU031) describes the ’C3x 
(C30 and ’C31) 32-bit floating-point microprocessors, developed for 
digital signal processing as well as embedded-control applications. 
Covered are its architecture, internal register structure, instruction set, 
pipeline, specifications, and operation of its DMA and its two serial ports. 
Software and hardware applications are included. 


TMS320 Floating-Point DSP Optimizing C Compiler User’s Guide 
(literature number SPRU0O34) describes the TMS320 floating-point C 
compiler. This C compiler accepts ANSI standard C source code and 
produces TMS320 assembly language source code for the ’C3x and 
’C4x generations of devices. 


TMS320 Floating-Point DSP Assembly Language Tools User’s Guide 
(literature number SPRUO35) describes the assembly language tools 
(assembler, linker, and other tools used to develop assembly language 
code), assembler directives, macros, common object file format, and 
symbolic debugging directives for the ‘C3x and ’C4x generations of 
devices. 


TMS320C3x C Source Debugger User’s Guide (literature number 
SPRU053) tells you how to invoke the ’C3x emulator, evaluation module, 
and simulator versions of the C source debugger interface. This book 
discusses various aspects of the debugger interface, including window 
management, command entry, code execution, data management, and 
breakpoints, and includes a tutorial that introduces basic debugger 
functionality. 


TMS320C30 Hewlett-Packard 64776 Analysis Subsystem User’s Guide 
(literature number SPRUO71) describes the analysis subsystem, which 
supplements the ‘C30 emulator capabilities by providing realtime 
breakpoint, trace, and timing features. The analysis subsystem can be 
used only with the C30 emulator. 


SPARC and S-bus are trademarks of Sun Microsystems, Inc. 
Spirit 30 is a trademark of Sonitech International Inc. 

SPOxX is a trademark of Spectron Microsystems, Inc. 

Tiger 30 is a trademark of DSP Research, Inc. 


VPRO-4 is a trademark of Voice Processing Corporation. 
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If You Need Assistance 


Do this. . . 


Call the CRCT: 
(800) 336-5236 


Or write to: 

Texas Instruments Incorporated 

Market Communications Manager, MS 736 
P.O. Box 1443 

Houston, Texas 77251-1443 


Call the CRCT: 
(800) 336-5236 


Call the DSP hotline: 
(713) 274-2320 


Fill out and return the reader response card at 
the end of this book, or send your comments to: 
Texas Instruments Incorporated 

Technical Publications Manager, MS 702 

P.O. Box 1443 

Houston, Texas 77251-1443 


t Texas Instruments Customer Response Center 


Read This First vii 


viii 


1 


2 


3 


Contents 


INMTFOGUCUION 2 icnicee tesco teeta eee Coen ete Oe eae cee itee eae 1-1 
1.1. Embedded Controller Requirements ............. 0000 cece eee eee eee eee 1-2 
1.2 TMS320C31 Key Features ........... ccc ete eens 1-3 
1.3: Compatible Devices wicca c0 vee bode ed ew ed ba ed ew nd Rad ed bed bed ee bees 1-5 
1.4 TMS320C31 Development Support ............ 0.000 ccs 1-6 
1.5 Benefits of a TMS320C31-Based Embedded System ...............00202: cece eee 1-8 
TMS320C31 Architectural Overview .........00.0c cece eee eee eee eee 2-1 
2.1. TMS320C31 Block Diagram ............. 000 eee 2-2 
2.2 Central Processing Unit (CPU) ........... 0... eee eae 2-4 
2.2.1. CPU Register File ........... 0.0 c cece eet es 2-6 
2.2.2 Auxiliary Register Arithmetic Units (ARAUs) .............2.22...00c eee ee 2-8 
22:3 Multiplier -sscsvetekseiseyeesetnid ences etaed eu tia eve cer teen deeded 2-8 
2.2.4 Arithmetic Logic Unit (ALU) ....... 00... eee 2-8 
2.2.5 CPU Memory Addressing Modes ..............0:: cece eee teen eeeees 2-8 
2.2.6 Instruction Set Summary ............0. 0c cece eee eens 2-11 
2.3. Memory Organization .......... 0... tenets 2-20 
2.3.1. RAM, ROM, and Cache ............... 0c cece een eee eens 2-20 
2.3.2 Memory Maps: ccc ccace cence beeed bed er lente sede earn eded ewe oe 2-22 
2.4 ‘Internal Bus Operation ............0 0.00 ccc teeta 2-24 
25 ~On-Chip Peripherals «2 ...003.0.01no0003 deewe deieaadaeadeinvendenershewaeen ee det 2-25 
2.5.1 “TIMOGIS! scans eiraetaaw beheide bahia hed oe batiw beGaeeudnwudaes ealintn 2-26 
2iO.2 Sela POM: fe sccase Ses hea teach ce atlnd acts ee tin Stead ang ere ger ctly eee ceed as a 2-26 
2.6 Direct Memory Access (DMA) ........ 0.0.0 cece eens 2-27 
2.7 External Bus Operation ............0 0.00 ccc ene eee eee 2-28 
2.7.1. External Bus Control Features .......... 0.00 cece cece eee 2-28 
2.7.2 Multiprocessor Support ........ 0.060. eect teens 2-28 
2:8; ‘INCMUPIS: seca idee siding edenae dengan ecks Keen Nike AOR he Rue Mee Aa le ei 2-29 
2.9 TMS320C31 Signal Descriptions .......... 0.0... c eects 2-30 
TMS320C31 Features/Performance Comparison ............00eeeeeeeee eee eens 3-1 
3.1. TMS320C31 Feature Comparison Versus Other Embedded Controllers ............ 3-2 
3.2 | TMS320C31 Benchmark Performance Versus Other Embedded Controllers ........ 3-4 
3.2.1. Dhrystone Benchmark ..............00 0c cece eee ett eens 3-5 
3.2.2 Bubble- and Quick-Sort Benchmarks .............. 0000 eects 3-5 
3.2.3 matmult Benchmark .............0 00 cece eee 3-5 


Contents ix 


Contents 


3.2.4 anneal Benchmark.............. 00000 cece eee ees 3-6 

3.2.5 Benchmark Summary ............000 cece eee eet eens 3-6 

4 Application Examples .c.2 <cscec ccc cece cca ee cee eee eee een eee ee 4-1 
4.1. Telecommunications Example Using SPOX ............0 000 e eee eens 4-2 
4.1.1. Speech Recognition With TMS320C31 and SPOX ..............00000005- 4-2 

4.1.2 Lower Cost and More Recognizers ..............0.c cece cece teen eee 4-2 

4.1.3. WVPRO-4: A Homogeneous Multi-DSP Architecture .................2020-. 4-3 

4.1.4 From Tiger 30 to Realtime Recognition .............. 0c cece eee eee 4-4 

4.1.5 ANew Level of Interoperability ........ 0.0... cece eee 4-4 

4.2 Instrumentation Application and Processor Evaluation Example................... 4-5 
4.2.1. Background and System Description .............. 000 eee eee eee 4-5 

4.2.2 Archive Shuffle ........... 00. ete 4-7 

4.2.3 Waveform Processing ............:0: cece eee tte eee eee eee ees 4-7 

4.2.4 Fast Fourier Transform ...... 00... eens 4-7 

4.2.5 Advantages of a TMS320C31 System ........... 0. 0c eee 4-8 

4.3 Test Equipment Example Using SPOX ............... 0c tees 4-9 
4.3.1. TMS320C30 and SPOX—Merging DSP and Control ..................-. 4-10 

4.3.2. From Proof-of-Concept to the Final Product ................-2-:eceeeeee 4-11 

5 Development Support ...........0. ccc eect eee eee eee 5-1 
5.1. TMS320C3x Optimizing ANSI C Compilers ............. 0000 cee eee eee 5-2 
5.1.1. TMS320C31 Compiler Optimizations ................ 0 ccc eee 5-3 

5.2 TMS320 Programmer’s Interface (C/Assembly Source Debugger) ............... 5-15 
5.3. TMS320C31 Assembly Language Tools .............2..: ccc eee eee eee eee ees 5-19 
5.4 TMS320C38x Software Simulator ............ 00 e ees 5-21 
5.5 TMS320C3x Evaluation Module ............ 000: ete eens 5-24 
5:6 —[MS820C3xX Emulator sce icsseicenepesservenectesoie ed ettes erga ed beeaeend ee be 5-26 
5.7 TMS320C38x Application Board With Software Demo ............2..000 cece ees 5-30 
5.8 HP 64776 Analysis Subsystem ............. 000 cece eens 5-31 
5.9 TMS320 Technical Support ............ 0.0. eee eens 5-33 
5.9.1. Technical Documentation ..............00 0 cece eee eee 5-33 

5.9.2 Details on Signal Processing Newsletter ...............0000 cece eee eens 5-34 

5.9.3 TMS320 Bulletin Board Service ........... 0.00 eee 5-34 

5.9.4 TMS320 DSP Technical Hotline ........... 0... cece eee eee 5-34 

5.9.5 TMS320 Application Software ............ 000 c cee eee 5-35 

59:6 ‘Design Workshops: ace .cteniidevied orate reddee ide. tee ek dw dae 5-35 

59:7 DeSIQN SEIVICES 2 ci jciee cease Cie tee eee kw Mee dd oe ee 5-37 

5.9.8° RIC Locations: 2 ccc: nies cehee dee nobis eee beh eee ed Dene iad 5-39 

6 TMS320C31 Third-Party Support ...........0 0c cece eee eee 6-1 
6.1 Accelerated Technology, Inc. ........... 2.0.0 eee 6-2 
6.2 A.T. Barrett & Associates, INC. 2... cette tenn eens 6-5 
6.3: ‘BIOMANON. wc tense esante acetate inated an databases iaeden ahead aeadenes 6-9 
64 IByte-BOS i iacrssecstieer asso etiate teed es eke ithe be deene ddd e bene edeae be 6-12 


A 


Contents 


6.5 Computer Motion, Inc. .... 2.0... 6 ete eet nent enna 6-13 
6.6 Electronic Tools GmbH ........... 0.6. ett eee ees 6-14 
6.7. Integrated Motion, Incorporated ........... cette 6-15 
6.8 Loughborough Sound Images Lid. ........ 0... cece cent eens 6-17 
6.9 Precise Software Technologies Inc. ........... 00 cece eect teen ees 6-19 
6.10 Spectron Microsystems Inc. ......... 6.00 c cee eee tee eee ees 6-23 
6.11 Spectrum Signal Processing Inc. .......... 00. c cette ees 6-31 
O12 Wallan ING» sc-necarsenesacind saci sadn saa Rae Maa hae Kadena tenndeaumaed 6-33 
6:13: TMEKUONIX oniteeeieorcied~esadecsieeeybeiawedateetanserayeesapacenawers as 6-37 
Ot” WINTISS: cass wach daecs Sa ad cP atace Sa adverts Qi te Qa ext mapaescaaea 3.4 Qubteeeai and daub teas 6-40 
TMS320 DSP Family oo siiiaieecceencee eelece ee ee eae sabe ewes eee eae eee eee A-1 
Act “The DSPiMarket itis betes atten eb be Lees eG pate Red aera ne od ween ee A-2 
A.2. The TI Role in the DSP Industry .......... 0... eee eee eee A-3 
A.3. The TMS320 Product Roadmap .......... 0.000 e eee teens A-4 
AA “IMSS20C1X ascites ster danced Dee nated widen ee etapa a cae ae ae ade ew ec A-9 
Avo:  WMS320C2% sakee cccunrodas Yotiaeadatea densa deen adarhadeepaduei a eee eeeiane A-10 
AGG’ “IMS320CSX 405245400 cdsot aie aesetaie eedove ty eaexGes eb irdevted edie dadeat A-11 
Al IMSS20C4X: secs nelndcacaasaaleeaathama cent Saas e anos dhe nak onweereaeRens A-12 
AGS. IMS320C5X) .cosecats wiedspndeagnedemedeseredaeaossarew ease sereauan ee des A-13 
Part Ordering Information ...........00 ccc cece eee ees B-1 
BA PartNUMDGlS) wetceccccndaadtneaaane ds aavnnee oenteard nee wa ated awed Aeatdw waa B-2 
B.2 Device and Development Support Tool Prefix Designators ................000000- B-4 
B.3. Dévic@ SuUMTXES 2.2... hee aie eee eng Deere Aided adda esa ned ede adhered B-5 


Contents xi 


Figures 


Lt td 
co NI OD 


Le ke dos 
No © 


| 
ne) 


a ee ee a a ae 
A 


| 
oa 


xii 


IMS820C31 PerOrmancesavs ster cetacheee deeds tiweyieeddate et Seaeea deer eedex 1-4 
TMS320C3x Block Diagram ......... 0.00 eet eee eens 1-5 
TMS320C3x Development Environment........... 0000: c cece eee eens 1-7 
Benefits of Replacing a Controller/Coprocessor With a TMS320C31-Based 

Embedded System 2405124 010.001 tadesisedeepordiatioseea eo han Gin ek oe danedestons 1-9 
TMS320C31 Block Diagram ......... 00. c eee eet teen eens 2-3 
Central Processing Unit (CPU) ........... 0000 cece eee eens 2-5 
Memory Organization n.cicrad iinet aiiths dieu booed en te eee 2-21 
TMS320C31 Memory Maps .......... 0000 cece eens 2-23 
Peripheral Modules .............. 000 cece eect nent eee eens 2-25 
DMA Controller ........ 00.0 c cece teen nee eee e eens 2-27 
VPRO-4 Hardware Architecture .......... 000 eee eee teens 4-3 
System Diagram ......... 0.0 c tenet eens 4-6 
Doble Test Set-Up o.:4.cce ied obi obras aie hendw ene behead vebeda ded dewde Dae 4-9 
The New Doble M Series System ............0 00: cece eee es 4-10 
Data Flow Optimizations for TMS320C31 Compilers .............0.0:.0 cee eee eee 5-6 
Copy Propagation and Control-Flow Simplification for TMS320C31 Compilers ........ 5-8 
In-Line Function Expansion for TMS320C31 Compilers .............2.0..0 cece eee ee 5-9 
Register Variables and Register Tracking/Targeting .......... 0.0: cee ee eee eee eee ee 5-10 


Repeat Blocks, Autoincrement Addressing Modes, Parallel Instructions, 
Strength Reduction, Induction Variable Elimination, Register Variables, and 


Loop Test Replacement for Floating-Point Compilers ................000e eee eens 5-12 
TMS320C31 Compiler Delayed Branch Optimizations ...............02.02.:0ce eee ee 5-13 
Loop WMO) os fs.eteis sana ceiin ois CoG seen did and se Bena de Reade bo de wt 5-14 
The Basic Debugger Display .............. 0.2 c cee eens 5-15 
Debugger’s Data Display .............. 00.0 c cece eee 5-17 
IMS820C3X EVM) .4.4¢2cthevastenteactsitey bitedadtedeene soa ours eeeestansas ed 5-24 
TMS320C3x XDS Emulator ......... 0.00 nett teens 5-28 
HP 64776 Analysis Subsystem ........... 00 cece ete ete eee 5-31 
Realtime Application TaskS ......... 0.0 cect 6-4 
MX31 Fitted With a Preliminary CCD Camera Interface Daughter Board............. 6-16 
SPOXAMCHILCCIUNG 2 baa ase Lae ae hace nab halos Man eRe ehdeee ae 6-24 
SPOX: Debug: SUPPOM....c20ccshcdvaderpidaritaeur deeds a4 piade cheat eeeon peal 6-28 
Open Signal Processing Architecture ............ 00. cece eee 6-29 


Figures 


AdaScope Debugger Screen ........ 0... nent eee eens 6-36 
Logic Analyzer Family ......... 0.0.00 e cece eee nena 6-38 
TMS320 Device Evolution ......... 0... teen ene een nen eens A-5 
TMS320 Device Nomenclature ......... 0.0 cence enn n eens B-5 


Contents xiii 


Tables 


I 
— 


| | 
ON ODO 


Tas i clic isle oes eles dae ie 
4 © 


XIV 


CPUURGGISICMS 2220 choretea dat doeenceda ere edn Manyieo tala th eden Sher ted es 2-6 
Indirect AGGIeSSING ss.2cnde ch cence dicaidunneiandarenadiendeiamderaeceaieaedeee 2-10 
System Control Instruction Summary ............000c cece eeeeeet eee 2-12 
Program Flow Control Instruction Summary .............:00cc eee eee eens 2-13 
Logical and Bit Manipulation Instruction Summary ...............0 cece ence eee eee 2-14 
Load and Store Instruction Summary ...........0.0: 0c eee e eee eeee eee e eee 2-15 
Arithmetic Instruction Set Summary ............. 00 cece eee eee eens 2-16 
Parallel Instruction Set Summary ............00 00 c cece eee eens 2-18 
TMS320C31 Signal Descriptions ............... 00 ccc eee eens 2-30 
Description of the Fields in Table 3-2 .. 0.2.2... ccc eens 3-2 
Feature/Performance Comparison of Embedded Controllers ..................02205- 3-3 
Benchmark Comparison of the TMS320C31 With Embedded Controllers at the 

Same Price Level ......... 0.00 cece een eee n eden eens 3-5 
RTC Worldwide Locations ...........0 0060 c cece tenet eens 5-39 
TMS320 Family Overview ........... 00 cece nent eee eee ees A-6 
TMS320 Family Features and Benefits ........... 0.0000 c cece eee ees A-8 
TMS320C3x Digital Signal Processor Part Numbers ................0 cece eee en eees B-2 
TMS320C3x Support Tool Part Numbers .............0 0c: cece eee eee eens B-2 


Chapter 1 


Introduction 


The Texas Instruments low-cost, high-performance TMS320C31 has defined 
a new role for digital signal processors in embedded systems. Well-suited for 
general-purpose use, the TMS320C31 is finding widespread acceptance as 
an embedded controller in applications such as: 


Industrial automation 
Telecommunications 
Motor control 
Automotive 
Instrumentation 
Laser printers 
Scanners 

Voice mail 


OUOUUUOUOU 


and is expanding the role of DSPs from math support to embedded control. 


The topics covered in this chapter include: 


Topic Page 
1.1. Embedded Controller Requirements .............0000eeeee eens 1-2 
1 2eIMSs20GsiiKeyireatureSe sere eee Leelee irae errr crt cnt 1-3 
1:3-Compatible!Devices 2. oc. stants tenets ete tiene ie eile tees re 1-5 
1.4 TMS320C31 Development Support .............000e eee 1-6 
1.5 Benefits of a TMS320C31-Based Embedded System ............. 1-8 


Embedded Controller Requirements 


1.1. Embedded Controller Requirements 


An embedded controller is a dedicated processor used in systems or subsys- 
tems such as laser printers, voice mail systems, and bar-code readers to con- 
trol a specific set of functions. Unlike a PC or workstation host CPU, an em- 
bedded controller is not accessible to the system user to run different software 
packages or to be reprogrammed, but is used to cost-effectively control a pre- 
determined set of functions. To make these types of systems successful, em- 
bedded controllers must possess the following characteristics: 


High overall performance for peripheral device management and data 
flow control 

Software compatibility (efficient compilers and realtime operating system 
support) 

Mature development tools and third-party support 

Flexibility 

Reliability 

Availability 

Low device price 

Low system cost 


DUOUOUO Oo 


The TMS320C31’s ability to satisfy these needs makes it an excellent choice 
when compared to embedded RISC and high-end CISC embedded control- 
lers. 


The TMS320C31 : 


Provides a low-cost solution 

Supports a general-purpose programming model 
Supports efficient C language compilation 
Enables high-performance system control 
Supports coprocessor math performance on-chip 
Integrates system peripherals on-chip 

Allows fast context switching 


OUOUUOUOUU 


The TMS320C31 is an embedded controller with dedicated digital signal pro- 
cessing support that provides low cost, high performance, system integration, 
and ease of use. Due to these cost and performance advantages, the 
TMS320C31 is displacing RISC and high-end CISC processors in a wide 
range of applications across many industries. 


TMS320C31 Key Features 


1.2 TMS320C31 Key Features 


The TMS320C31 includes the features normally associated with a general- 
purpose embedded controller, so designing with it is very similar to designing 
with RISC or CISC devices. But the C31 is distinguished by many high-perfor- 
mance features not found on processors in its price range: 


_] High performance 


50-ns instruction cycle 

20 MIPS (million instructions per second) 

40 MFLOPS (million floating-point operations per second) 

220 MOPS (million operations per second) (see Figure 1—1 on 
page 1-4) 

80-Mbytes/second I/O bandwidth 

0.200 us interrupt response 

60-ns and 74-ns devices also available 


(_] Register-based, pipelined CPU 


Parallel multiply and arithmetic/logical operations on integer or float- 
ing-point numbers in a single cycle 

Eight extended-precision registers 

24-bit address space 

Two address generators with eight auxiliary registers, two index regis- 
ters, and two auxiliary register arithmetic units 

32-bit barrel shifter 


(_} Powerful instruction set 


Single-cycle instruction execution 

System control and numeric operations 

Two and three operand instructions 

Zero-overhead looping 

Single-cycle branching 

Conditional calls and returns 

Flexible addressing modes including circular addressing and auto- 
increment/decrement modes allow high-speed data accesses 
Single-cycle parallel math and memory operations 

Interlocked instructions for multiprocessing support 


(_] Integrated peripherals 


DMA controller for concurrent I/O and CPU operation 

Two-way set associative instruction cache maximizes performance 
while minimizing system cost 

Flexible serial port for 8/16/24/32-bit transfers which can be config- 
ured for general-purpose bit I/O plus two 16-bit timers 

Two 32-bit timers which can also be configured for bit I/O 
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TMS320C31 Key Features 


[|_| Extensive internal busing and parallelism for extremely fast data-move- 
ment capability 


[|_|] 8K bytes of single-cycle dual-access internal RAM support two accesses 
per machine cycle—can act as program memory, data memory, cache to 
external memory, or register file extensions 


|_| Memory interface optimized for single-cycle SRAM accesses and static- 
column decode DRAMs for high-speed external memory access while 
maintaining low system cost 


Boot loader to load/execute programs from other processors or inexpen- 
sive EPROMS 


On-chip emulation for true nonintrusive visibility and control during debug 
132-pin plastic quad flat pack (PQFP) package 


OU U 


Low price 
The TMS320C31 is described in detail in Chapter 2. 


Figure 1-1. TMS320C31 Performance 
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Compatible Devices 


1.3 Compatible Devices 


The TMS320C31 is one of two members of the TMS320C3x generation of 
DSPs. The other member is the TMS320C30, which is object-code compatible 
with the ’C31. The ’C30 is identical to the C31 except that it has 4K words of 
ROM, two serial ports, and a second external bus. For more information on the 
TMS320C30, refer to the TMS320C3x User's Guide (literature number 
SPRU031). Figure 1—2 is a block diagram of the TMS320C3x devices. The 


shaded areas highlight the features that apply only to the ‘C30. 
Figure 1-2. TMS320C3x Block Diagram 


RAM Block 0 RAM Block 1 ROM Block 0 
(1K x 32) (1K x 32) (4K x 32) 


Data Buses 
CPU DMA 


Program 
Cache 
(64 x 32) 


Integer/ Integer/ Address Generators 
Floating-Point Floating-Point 
Multiplier ALU Control Registers 


Peripheral Bus 
1 =| 
3 
@Q i 
oO 


as o 8 Extended-Precision 
MCBL/MP ———»| 2 Registers 
X1 « —_] 2 
X2/CLKIN ——__»] & Address Address 
= Generator 0 Generator 1 
VpbD ———> 
aos ——> 8 Auxiliary Registers 
; 12 Control Registers 


Available on 
TMS320C30, 
TMS320C30-27, 


and 
TMS320C30-40 


In addition, the C30 and ’C31 are both source-code compatible with the 
TMS320C4x, which is the first DSP designed specifically for parallel proces- 
sing. For more information on the ’C4x, refer to the TMS320C4x Technical 
Brief (literature number SPRUO76). 


oOo 


Introduction 1- 


TMS320C31 Development Support 


1.4 TMS320C31 Development Support 


The ’C31’s general-purpose, 32-bit architecture and Tl’s comprehensive set 
of development tools make designing systems with a’C31 as easy as design- 
ing with a traditional controller. These tools include 


DUOUOUUOUOUOUUU UU 


ANSI-compatible optimizing C compiler 

Realtime operating system support 

The programmer’s interface—a window-based C-source/assembly de- 
bugger 

Code profiler 

Software simulator 

Low-cost evaluation module (EVM) 

TMS320C3x XDS scan-based emulator 

’C8x application board 

HP64700 analysis subsystem 

Extensive third-party support 

Hotline support 

Bulletin board support 

Thousands of pages of application notes and technical documentation 


A complete description of TMS320C31 development support can be found in 
Chapter 5. Figure 1-32 illustrates the ‘C31 development flow. 


TMS320C31 Development Support 


Figure 1-3. TMS320C3x Development Environment 
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TMS320C31 performance benchmarks can be found in Chapter 3 and de- 
tailed system examples are shown in Chapter 4. 
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Benefits of a TMS320C31-Based Embedded System 


1.5 Benefits of a TMS320C31-Based Embedded System 


The device price, development environment, external memory cost, and inte- 
grated peripherals of the TMS320C31 are equivalent to those of 32-bit micro- 
controller solutions. At the same time, the powerful instruction set and pipe- 
lined CPU provide the system control performance of a RISC processor—at 
a more affordable price. But the TMS320C31 is superior to RISC/CISC solu- 
tions in numerical performance and emulation capability. This best-of-both- 
worlds feature set delivers many benefits to next-generation embedded sys- 
tems. 


With a TMS320C31, many added-cost system features become reduced-cost 
features. Traditional embedded-system architectures use a microcontroller for 
system control and a coprocessor (companion math chip, programmable or 
special-purpose DSP, or ASIC) for math support. This traditional system archi- 
tecture has performance and time-to-market drawbacks because the designer 
must learn two different architectures and development environments, and at- 
tempt to implement efficient communications between different types of pro- 
cessors. Today, designers are using a C31 to replace microcontrollers for 
higher performance and to reduce system cost and time to market in dual-pro- 
cessor designs. Also, for even higher performance and homogeneous sys- 
tems, multiple ’C31s can be used. The ’C31 offers numerous advantages for 
embedded-control applications such as voice mail, industrial automation, 
instrumentation, audio, motor control, automotive, and laser printer systems. 
Figure 1—4 shows the benefits of replacing a controller/coprocessor with a 
TMS320C31. 


Benefits of a TMS320C31-Based Embedded System 


Figure 1-4. Benefits of Replacing a Controller/Coprocessor With a TMS320C31-Based 
Embedded System 
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Reduced data flow 
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Small form factor 
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Chapter 2 


TMS320C31 Architectural Overview 


This chapter provides an architectural overview of the TMS320C31 embedded 
processor. An in-depth description of its features can be found in the 
TMS320C3x User’s Guide. 


Topics discussed in this chapter include: 


Topic Page 
2 IMS3S20C31 Block Diagramimn.- emer ceie crises etciiee ie riser et 2-2 
2.2 Central Processing Unit (CPU) .............cceee cece cence eens 2-4 
2:3) Memory Ongamizationi cere er rei iel-l-pe slate stele seers ee ere el teeta elevates 2-20 
2/Aeainternali/Busi©peratlomieecrr rier terrier eesti terrier 2-24 
2:5, (On-Chip' Peripherals yrete orcy-1-terct-teteyav= vite rela eteteerete eet teel-eiepeiey-yeyeeten 2-25 
2.6 Direct Memory Access (DMA) ...........00c cece eee e eee e eens 2-27 
2:7, )ExternalliBus'Operation-. 1. <select: 2-28 
2:8. IMTOMNUPUS verctrs eerie eis eras cies late alee etelscetaieie'e(iajsierein etevasere eisialelape eles 2-29 
2.9 TMS320C31 Signal Descriptions .............00ceeee eee e eee eee 2-30 
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TMS320C31 Block Diagram 


2.1 TMS320C31 Block Diagram 


Figure 2—1 is a block diagram of the TMS320C31 architecture. Throughout 
this chapter, refer to this block diagram to better understand the interface of 
the components of the ‘C31 embedded controller. 
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Figure 2-1. TMS320C31 Block Diagram 
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Central Processing Unit (CPU) 


2.2 Central Processing Unit (CPU) 
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The TMS320C31 has a register-based, pipelined CPU architecture. The ’C31 
CPUis similarto a RISC microprocessor CPU in that most instructions execute 
in asingle cycle. However, the ’C31 instruction set is more powerful—multiple 
operations can be performed in a single-instruction cycle and the operands of 
logical and arithmetic instructions can be read from memory and operated on 
inasingle cycle. Because its separate multiplier and ALU are incorporated into 
the CPU, the ’C31 supports single-cycle logical and arithmetic operations. 
These units do not require pipelined, staged execution to achieve maximum 
performance, allowing the ’C31 to achieve low-latency execution of numeric 
operations. In addition, the same multiplier and ALU are used for both integer 
and floating-point math, providing you flexibility and equal performance for ei- 
ther data format. 


The TMS320C31 can perform a multiply and ALU operation in a single cycle, 
allowing realtime DSP or other math and logical functions to be done in parallel 
every cycle, without latency. Hence, 40 MFLOPS or 40 integer multiply-accu- 
mulates operations can be sustained with a 40-MHz TMS320C31. In addition 
to the integrated math support, the CPU architecture provides a high degree 
of parallelism on-chip, allowing on- and off-chip resources to be utilized most 
effectively. 


Figure 2-2 is a block diagram of the C31 CPU. 


Central Processing Unit (CPU) 


Figure 2-2. Central Processing Unit (CPU) 
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Central Processing Unit (CPU) 


2.2.1 CPU Register File 


Table 2-1. 
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The TMS320C31 provides 28 registers in a multiport register file that is tightly 
coupled to the CPU. All of these registers can be operated upon by the multipli- 
er and ALU andcan be used as general-purpose registers. However, the regis- 
ters also have some special functions. For example, the eight extended-preci- 
sion registers are especially suited for maintaining extended-precision float- 
ing-point results. The eight auxiliary registers support a variety of indirect ad- 
dressing modes and can be used as general-purpose 32-bit integer and logical 
registers. The remaining registers provide system functions such as addres- 
sing, stack management, processor status, interrupts, and block repeat. 


The register names and assigned functions are listed in Table 2-1. Following 
the table, the function of each register or group of registers is briefly described. 


CPU Registers 


Register Assigned Function 
Name 


Extended-precision register 0 
Extended-precision register 1 
Extended-precision register 2 
Extended-precision register 3 
Extended-precision register 4 
Extended-precision register 5 
Extended-precision register 6 
Extended-precision register 7 


Auxiliary register 0 
Auxiliary register 1 
Auxiliary register 2 
Auxiliary register 3 
Auxiliary register 4 
Auxiliary register 5 
Auxiliary register 6 
Auxiliary register 7 


Data-page pointer 
Index register 0 
Index register 1 
Block size 

System stack pointer 


Status register 

CPU/DMA interrupt enable 
CPU interrupt flags 

/O flags 


Repeat start address 
Repeat end address 
Repeat counter 


DP 
IRO 
IR1 
BK 
SP 
ST 
IE 
IF 
IOF 
RS 
RE 
RC 


Central Processing Unit (CPU) 


The extended-precision registers (R7—RO) are capable of storing and sup- 
porting operations on 32-bit integer and 40-bit floating-point numbers. Any 
instruction that assumes the operands are floating-point numbers uses bits 
39-0. If the operands are either signed or unsigned integers, only bits 31-0 
are used; bits 39-32 remain unchanged. Bits 39-32 remain unchanged for all 
shift operations. 


The 32-bit auxiliary registers (AR7—ARO) can be accessed by the CPU and 
modified by the two Auxiliary Register Arithmetic Units (ARAUs). The primary 
function of the auxiliary registers is the generation of 24-bit addresses. They 
can also be used as loop counters or as 32-bit general-purpose registers that 
can be modified by the multiplier and ALU. 


The data page pointer (DP) is a 32-bit register. The eight LSBs of the data 
page pointer are used by the direct addressing mode as a pointer to the page 
of data being addressed. Data pages are 64K words long with a total of 256 
pages. 


The 32-bit index registers (IRO, IR1) contain the value used by the Auxiliary 
Register Arithmetic Unit (ARAU) to compute an indexed address. 


The ARAU uses the 32-bit block size register (BK) in circular addressing to 
specify the data block size. 


The system stack pointer (SP) is a 32-bit register that contains the address 
of the top of the system stack. The SP always points to the last element pushed 
onto the stack. A push performs a preincrement, and a pop performs a post- 
decrement of the system stack pointer. The SP is manipulated by interrupts, 
traps, calls, returns, and the PUSH and POP instructions. 


The status register (ST) contains global information relating to the state of the 
CPU. Typically, operations set the condition flags of the status register accord- 
ing to whether the result is zero, negative, etc. This includes register load and 
store operations as well as arithmetic and logical functions. When the status 
register is loaded, however, a bit-for-bit replacement is performed with the con- 
tents of the source operand, regardless of the state of any bits in the source 
operand. Therefore, following a load, the contents of the status register are 
equal to the contents of the source operand. This allows the status register to 
be easily saved and restored. 


The CPU/DMA interrupt enable register (IE) is a 32-bit register. The CPU 
interrupt enable bits are in locations 10-0. The DMA interrupt enable bits are 
in locations 26-16. A 1 ina CPU/DMA interrupt enable register bit enables the 
corresponding interrupt. A 0 disables the corresponding interrupt. 


The CPU interrupt flag register (IF) is also a 32-bit register. A 1 in a CPU in- 
terrupt flag register bit indicates that the corresponding interruptis set. A 0 indi- 
cates that the corresponding interrupt is not set. 


The I/O flags register (IOF) controls the function of the dedicated external 
pins, XFO and XF1. These pins may be configured for input or output and may 
also be read from and written to. 
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Central Processing Unit (CPU) 


The repeat counter (RC) is a 32-bit register used to specify the number of 
times a block of code is to be repeated when performing a block repeat. When 
the processor is operating in the repeat mode, the 32-bit repeat start address 
register (RS) contains the starting address of the block of program memory 
to be repeated, and the 32-bit repeat end address register (RE) contains the 
ending address of the block to be repeated. 


The program counter (PC) is a 32-bit register containing the address of the 
next instruction to be fetched. Although the PC is not part of the CPU register 
file, itis a register that can be modified by instructions that modify the program 
flow. 


2.2.2 Auxiliary Register Arithmetic Units (ARAUs) 


2.2.3 Multiplier 


Two auxiliary register arithmetic units (ARAUO and ARAU1) can generate two 
addresses in a single cycle. The ARAUs operate in parallel with the multiplier 
and ALU. They support addressing with displacements, index registers (IRO 
and IR1), and circular and bit-reversed addressing. 


The multiplier performs single-cycle multiplications on 24-bit integer and 32-bit 
floating-point values. The TMS320C31 implementation of floating-point arith- 
metic allows for floating-point operations at fixed-point speeds via a 50-ns 
instruction cycle and a high degree of parallelism. To gain even higher through- 
put, you can use parallel instructions to perform a multiply and ALU operation 
in a single cycle. 


When the multiplier performs floating-point multiplication, the inputs are 32-bit 
floating-point numbers, and the result is a 40-bit floating-point number. When 
the multiplier performs integer multiplication, the input data is 24 bits and yields 
a 32-bit result. 


2.2.4 Arithmetic Logic Unit (ALU) 


The ALU performs single-cycle operations on 32-bit integer, 32-bit logical, and 
40-bit floating-point data, including single-cycle integer and floating-point con- 
versions. Results of the ALU are always maintained in 32-bit integer or 40-bit 
floating-point formats. The barrel shifter is used to shift up to 32 bits left or right 
in a single cycle. 


Internal buses, CPU1/CPU2 and REG1/REG2, carry two operands from 
memory and two operands from the register file, thus allowing parallel multi- 
plies and adds/subtracts on four integer or floating-point operands in a single 
cycle. 


2.2.5 CPU Memory Addressing Modes 
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The TMS320C31 supports a base set of general-purpose instructions as well 
as arithmetic-intensive instructions that are particularly suited for digital signal 
processing and other numeric-intensive applications. 


Central Processing Unit (CPU) 


For use with the general-purpose and arithmetic instructions, five groups of 
addressing modes are provided on the TMS320C31. Six types of addressing 
may be used within the groups, as shown in the following list: 


_} General addressing modes: 
HM Register. The operand is a CPU register. 
@ Short immediate. The operand is a 16-bit immediate value. 
@ Direct. The operand is the contents of a 24-bit address. 


HM Indirect. An auxiliary register indicates the address of the operand. 


[_} Three-operand addressing modes: 
HM Register. Same as for general addressing mode. 


HM Indirect. Same as for general addressing mode. 


_] Parallel addressing modes: 
M@ Register. The operand is an extended-precision register. 
HM Indirect. Same as for general addressing mode. 
[_] Long-immediate addressing mode: 
H@ Long-immediate. The operand is a 24-bit immediate value. 
[_] Conditional branch addressing modes: 
HM Register. Same as for general addressing mode 


H@ PC-relative. A signed 16-bit displacement is added to the PC. 


The various indirect addressing options available for the ‘C31 are shown in 
Table 2—2. The table shows the options, along with the value of the modifica- 
tion (mod) field, assembler syntax, operation, and function for each. 
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Table 2-2. Indirect Addressing 


00010 *++ARn( = addr = ARn + disp With predisplacement add and modify 
ARn = ARn + disp 
00011 *-—ARn(disp) addr = ARn — disp With predisplacement subtract and modify 
ARn = ARn — disp 
00100 *ARn++(disp) addr = ARn With postdisplacement add and modify 
ARn = ARn + disp 
00101 *ARn—- (disp) addr = ARn With postdisplacement subtract and modify 
ARn = ARn — disp 
00110 *ARn++(disp)% | addr =ARn With postdisplacement add and circular 
ARn = circ(ARn + disp) modify 
00111 Lo -- Tas, add = ARn With postdisplacement subtract and 
ARn = circ(ARn — disp) circular modify 


Indirect Addressing With Index Register IRO 
01000 *+ARn(IRO) addr = ARn + IRO With preindex (IRO) add 
01001 *—ARn(IRO) addr = ARn — IRO With preindex (IRQ) subtract 


01010 *++ARn(IRO) addr = ARn + IRO With preindex (IRO) add and modify 
ARn = ARn + IRO 

01011 *——ARn(IRO) addr = ARn — IRO With preindex (IRO) subtract and modify 
ARn = ARn — IRO 

01100 *ARn++(IRO) addr = ARn With postindex (IRO) add and modify 
ARn = ARn + IRO 

01101 *ARn-——(IRO) addr= ARn With postindex (IRO) subtract and modify 
ARn = ARn — IRO 


01110 *ARn++(IRO)% | addr =ARn With postindex (IRO) add and circular 
ARn = circ(ARn + IRO) modify 


01111 *ARn-—-(IRO)% | addr =ARn With postindex (IRO) subtract and circular 
ARn = circ(ARn)— IRO modify 


LEGEND: 
addr = memory address 
ARn = auxiliary register ARO — AR7 
IRn = index register IRO or IR1 
disp = displacement 
++ = add and modify 
-—- = subtract and modify 
circ() = address in circular addressing 
% = where circular addressing is performed 
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Central Processing Unit (CPU) 


Table 2-2. Indirect Addressing (Concluded) 


10000 With preindex (IR1) add 
addr = ARn — IR1 With preindex (IR1) subtract 


10010 *+44ARn(IR1) addr = ARn + IR1 With preindex (IR1) add 
ARn = ARn + IR1 and modify 

10011 *——ARn(IR1) addr = ARn —!IR1 With preindex (IR1) subtract 
ARn = ARn -IR1 and modify 

10100 *ARn ++ (IR1) addr = ARn With postindex (IR1) add 
ARn = ARn + IR1 and modify 


10101 *ARn—-(IR1) addr = ARn With postindex (IR1) subtract 
ARn = ARn — IR1 and modify 

10110 *ARn++ (IR1)% addr = ARn With postindex (IR1) add 
ARn = circ(ARn + IR1) and circular modify 

10111 *ARn-—(IR1)% addr = ARn With postindex (IR1) subtract 
ARn = circ(ARn — IR1) and circular modify 


Indirect Addressing (Special Cases) 


*ARn++(IRO)B addr = ARn With postindex (IRO) add 
ARn = B(ARn + IRO) and bit-reversed modify 

LEGEND: 

addr = memory address 

ARn = auxiliary register ARO — AR7 

IRn = index register IRO or IR1 

disp = displacement 

++ = add and modify 

-—- = subtract and modify 

circ() = address in circular addressing 

% = where circular addressing is performed 

B = where bit-reversed addressing is performed 


2.2.6 Instruction Set Summary 


The ’C31 offers instructions for both embedded control and numeric support. 
The following tables show each instruction’s mnemonic, description, and op- 
eration. Table 2-3 shows the system control instructions; Table 2—4 lists the 
program flow control instructions; Table 2—5 shows the logical and bit-manipu- 
lation instructions; Table 2-6 lists the load and store instructions; Table 2-7 
shows the arithmetic instructions; and Table 2-8 summarizes the 
TMS320C31 parallel instructions, which execute in a single cycle. 
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Table 2-3. System Control Instruction Summary 


Tainemonie [_Deserption [Operation ——*d 


IACK Interrupt acknowledge Dummy read of src 

IACK toggled low, then high 
IDLE Idle until interrupt PC +1—PC 

Idle until next interrupt 


ror [Pontiegertamsack SP 0ey 


RETIcond Return from interrupt conditionally If cond = true or missing: 
*SP--— PC 
1 — ST (GIE) 
Else, continue 
RETScond Return from subroutine conditionally If cond = true or missing: 
*SP--— PC 
Else, continue 
Signal, interlocked Signal interlocked operation 
Wait for interlock acknowledge 
Clear interlock 


Software interrupt Perform emulator interrupt sequence 


TRAPcond Trap conditionally If cond = true or missing: 
Next PC — * ++ SP 
Trap vector N — PC 
0 — ST (GIE) 
Else, continue 


LEGEND: 
src = general addressing modes Dreg = register address (any register) 
src = three-operand addressing modes Rn = register address (R7 — RO) 
src2 =  three-operand addressing modes Daddr = destination memory address 
Csrc = conditional-branch addressing modes ARn = auxiliary register n (AR7 — ARO) 
Sreg = register address (any register) addr = 24-bit immediate address (label) 
count = _ shift value (general addressing modes) cond = condition code (see Chapter 11) 
SP = _ stack pointer ST = Status register 
GIE = global interrupt enable register RE = repeat interrupt register 
RM = repeat mode bit RS = repeat start register 
TOS = top of stack PC = program counter 


Cc = Carry bit 
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Table 2-4. Program Flow Control Instruction Summary 


[ainemonie [Description | _Operaion———*d 


Branch conditionally (standard) If cond = true: 
If Csrc is a register, Csrc 4 PC 
If Csrc is a value, Csrc + PC — PC 
Else, PC +1 — PC 


BconaD Branch conditionally (delayed) If cond = true: 
If Csrc is a register, Csrc — PC 
If Csrc is a value, Csrc +PC +3 — PC 
Else, PC +1 — PC 


a Branch unconditionally (standard) Value > PC 
[BRD =| Branch unconditionally (delayed) Value > PC 


—— Call subroutine PC +1—2T0S 
Value — PC 


CALLcond Call subroutine conditionally If cond = true: 
PC +1— TOS 
If Csrc is a register, Csrc — PC 
If Csrc is a value, Csrc + PC PC 
Else, PC +1 — PC 


DBcond Decrement and branch conditionally ARn—1— ARn 
(standard) If cond = true and ARn = 0: 
If Csrc is a register, Csrc — PC 
If Csrc is a value, Csrc + PC + 1 4 PC 
Else, PC +1— PC 


DBcondD Decrement and branch conditionally ARn—1— ARn 
(delayed) If cond = true and ARn = 0: 
If Csrc is a register, Csrc — PC 
If Csrc is a value, Csrc + PC + 3 4 PC 
Else, PC +1 — PC 
RPTB Repeat block of instructions src > RE 
1 — ST (RM) 
Next PC + RS 


Repeat single instruction src > RC 
1 — ST (RM) 
Next PC + RS 
Next PC + RE 


LEGEND: 
src = general addressing modes Dreg = register address (any register) 
src = three-operand addressing modes Rn = register address (R7 — RO) 
src2 = three-operand addressing modes Daddr = destination memory address 
Csrc = conditional-branch addressing modes ARn = auxiliary register n (AR7 — ARO) 
Sreg = register address (any register) addr = 24-bit immediate address (label) 
count = __ Shift value (general addressing modes) cond = condition code (see Chapter 11) 
SP = stack pointer ST = Status register 
GIE = global interrupt enable register RE = repeat interrupt register 
RM = repeat mode bit RS = repeat start register 
TOS = top of stack PC = program counter 

Cc = Carry bit 
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Central Processing Unit (CPU) 


Table 2-5. Logical and Bit Manipulation Instruction Summary 
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a 


ANDN3 Bitwise logical-ANDN (3-operand) src1 AND src2 — Dreg 
CMPF Compare floating-point values Set flags on Rn — src 


CMPF3 Compare floating-point values Set flags on src1 — src2 
(3-operand) 


CMPI Compare integers Set flags on Dreg — src 
CMPI3 Compare integers (3-operand) Set flags on src1 — src2 


JOR BitwiselogicatOR = [Dreg ORsio—> Deg 
stct OR stc2 — Dreg 
XOR 


LEGEND: 
src = general addressing modes Dreg = register address (any register) 
src =  three-operand addressing modes Rn = register address (R7 — RO) 
src2 =  three-operand addressing modes Daddr = destination memory address 
Csrc = conditional-branch addressing modes ARn = auxiliary register n (AR7 — ARO) 
Sreg = register address (any register) addr = 24-bit immediate address (label) 
count = _ shift value (general addressing modes) cond = condition code (see Chapter 11) 
SP = _ Stack pointer ST = Status register 
GIE = global interrupt enable register RE = repeat interrupt register 
RM = repeat mode bit RS = repeat start register 
TOS = top of stack PC = program counter 

Cc = Carry bit 


Central Processing Unit (CPU) 


Table 2-6. Load and Store Instruction Summary 


nemonie | Basarion —[ Opwration —] 
Load floating-point exponent src(exponent) > Rn(exponent) 


LDFcond Load floating-point value conditionally If cond = true, src > Rn 
Else, Rn is not changed 
LDFI Load floating-point value, interlocked Signal interlocked operation src > Rn 


Else, Dreg is not changed 
$I 
Store integer, interlocked 


LEGEND: 
src = general addressing modes Dreg = register address (any register) 
src = three-operand addressing modes Rn = register address (R7 — RO) 
src2 = three-operand addressing modes Daddr = destination memory address 
Csrc = conditional-branch addressing modes ARn = auxiliary register n (AR7 — ARO) 
Sreg = register address (any register) addr = 24-bit immediate address (label) 
count = __ Shift value (general addressing modes) cond = condition code (see Chapter 11) 
SP = _ stack pointer ST = Status register 
GIE = global interrupt enable register RE = repeat interrupt register 
RM = repeat mode bit RS = repeat start register 
TOS = top of stack PC = program counter 

Cc = Carry bit 
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Table 2-7. Arithmetic Instruction Set Summary 
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[tinemonie [Desorption «Operation 


A Arithmetic shift If count > 0: 
(Shifted Dreg left by count) > Dreg 
Else: 
(Shifted Dreg right by |count|) — Dreg 
ASH3 Arithmetic shift (8-operand) If count > 0: 
(Shifted src left by count) > Dreg 
Else: 
(Shifted src right by |count|) > Dreg 


SH 

Convert floating-point value to integer Fix (src) > Dreg 

FLOAT Convert integer to floating-point value Float(src) + Rn 
SH i i 


L Logical shift If count => 0: 
(Dreg left-shifted by count) + Dreg 
Else: 
(Dreg right-shifted by |count|) — Dreg 
LSH3 Logical shift (3-operand) If count > 0: 
(src left-shifted by count) — Dreg 
Else: 
(src right-shifted by |count|) — Dreg 


NORM Normalize floating-point value Normalize (src) > Rn 


Round floating-point value Round (src) > Rn 
Rotate left Dreg rotated left 1 bit + Dreg 


Central Processing Unit (CPU) 


Table 2-7. Arithmetic Instruction Summary (Concluded) 


[inemonie[—beseipion —[ Operaton 
ROLC Rotate left through carry Dreg rotated left 1 bit through carry > Dreg 
ROR Rotate right Dreg rotated right 1 bit > Dreg 


Subtract integers with borrow (3-oper- src1 — src2—C — Dreg 
and) 


If Dreg — src 2 0: 
[(Dreg — src) << 1] OR 1 — Dreg 
Else, Dreg << 1 — Dreg 
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Central Processing Unit (CPU) 


Table 2-8. Parallel Instruction Set Summary 
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Tainemonie [_Deserntion [Operation ——*d 


Parallel Arithmetic With Store Instructions 


STF src3 — dst2 
STI src3 — dst2 
ADDF3 Add floating-point src1 + src2 — dst1 
STF src3 > dst2 
src3 — dst2 
AND3 Bitwise logical-AND src1 AND src2 — dst1 
iS Peewee eae 


ASH3 Arithmetic shift If count = 0: 
STI src2 << count > dst1 
|| src3 — dst2 
Else: 
src2 >> |count| — dst1 
|| src3 — dst2 


cal. src3 — dst2 
mae src3 — dst2 


LDI Load integer src2 — dst1 
STI src3 > dst2 


LSH3 Logical shift If count = 0: 

STI src2 << count > dst1 
|| src3 — dst2 

Else: 
src2 >> |count| — dst 

|| src3 — dst2 

MPYF3 Multiply floating-point srce1 x src2 > dst 

|| STF || src3 + dst2 


MPYI3 Multiply integer sre1 x src2 > dst 
NEGF Negate floating-point 0— src2 — dst1 


LDF Load floating-point src2 — dst1 
STF stc3 — dst2 


[ ninemonie | ___Deseripion ‘| 


Parallel Arithmetic With Store Instr 


NEGI Negate integer 

STI 

NOT Complement 

STI 

OR3 Bitwise logical-OR 
STI 

Subtract floating-point 
Subtract integer 


Bitwise exclusive-OR 


SUBF3 
STF 


SUBI3 
STI 


XOR3 
STI 


Load integer 


MPYF3 Multiply and add floating-point 
ADDF3 
MPYF3 Multiply and subtract floating-point 
SUBF3 
MPYI3 Multiply and add integer 
ADDI3 
MPYI3 Multiply and subtract integer 
SUBI3 
Parallel Store Instruct 
STF Store floating-point 
STF 
STI Store integer 
STI 
LEGEND: 
src = register addr (R7 — RO) 
src3 = register addr (R7 — RO) 
dst1 = register addr (R7 — RO) 
op3 = register addr (RO or R1) 


Central Processing Unit (CPU) 


Table 2-8. Parallel Instruction Set Summary (Concluded) 


uctions (Concluded) 


0 -src2 — dst 
src3 — dst2 


src1 — dst1 
src3 — dst2 


src1 OR src2 — dst1 
src3 — dst2 


src1 — src2 — dst1 
src3 — dst2 


src1 —src2 — dst1 
src3 — dst2 


src1 XOR src2 —> dst1 
src3 — dst2 


Parallel Load Instructions 


Load floating-point 


src2 — dst1 
src4 — dst2 


src2 — dst1 
src4 — dst2 


Parallel Multiply And Add/Subtract Instructions 


opi x op2 > op3 
op4 + op5 — op6 
op1 x op2 > op3 
op4 — op5 — op6 
op1 x op2 > op3 
op4 + op5 — op6 


op1 x op2 > op3 
op4 — op5 — op6 


ions 
src1 — dst1 
src3 — dst2 


src1 — dst1 
src3 — dst2 


src2 = ___ indirect addr (disp = 0, 1, IRO, IR1) 
src4 = __ indirect addr (disp = 0, 1, IRO, IR1) 
dst2 = ___ indirect addr (disp = 0, 1, IRO, IR1) 
op6 = register addr (R2 or R3) 


0p1,0p2,0p4,0p5 — Two of these operands must be specified using register addr, and two must be specified 


using indirect. 
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Memory Organization 


2.3. Memory Organization 


The total memory space of the TMS320C31 is 16 megawords (32 bits each). 
Program, data, and I/O space are contained within this 16-megaword address 
space, allowing tables, program code, or data to be stored in either RAM or 
ROM. This single address space allows you to maximize the use of the 
memory space and to partition it as desired. 


2.3.1 RAM, ROM, and Cache 
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Figure 2-3 shows how the memory is organized on the TMS320C31. RAM 
blocks 0 and 1 are 1K x 32 bits each. Each RAM and ROM block is capable 
of supporting two CPU accesses in either RAM block. The ’C31 also has an 
on-chip bootloader ROM, which allows program stored in off-chip memory or 
transferred through the serial port to be loaded anywhere in the memory map. 
The separate program buses, data buses, and DMA buses allow parallel pro- 
gram fetches, data reads and writes, and DMA operations. For example, the 
CPU can access a data value in one RAM block and perform an external pro- 
gram fetch in parallel with the DMA loading another RAM block, all within a 
single cycle. 


A 64 x 32-bit instruction cache is provided to store frequent sections of code, 
thus greatly reducing the number of off-chip accesses necessary. This allows 
code to be stored off-chip in slower, lower-cost memories. The external buses 
are also freed for use by the DMA, external memory fetches, or other devices 
in the system. 


Memory Organization 


Figure 2-3. Memory Organization 
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2.3.2 Memory Maps 
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There are two TMS320C31 memory maps. Use of either one depends on 
whether the processor is running in the microprocessor mode (MCBL/MP = 0) 
or the bootloader mode (MCBL/MP = 1). The memory maps are similar (see 
Figure 2—4). All of the memory-mapped peripheral registers are in locations 
808000h through 8097ffh. In both modes, RAM block 0 is located at addresses 
809800 through 809bF Fh, and RAM block 1 is located at addresses 809c00 
through 809fffh. 


In microprocessor mode, the bootloader ROM is not mapped into the 
TMS320C31 memory map. Locations Oh through OBFh consist of interrupt 
vector, trap vector, and reserved locations, all of which are accessed over the 
external memory port (STRB active). Locations OCOh through O7FFFFFh and 
locations 80A000h through OFFFFFFh are also accessed using STRB. 


In bootloader mode, the bootloader ROM is mapped into locations Oh through 
OFFFh. There are 192 locations (Oh through OBFh) within this block for the 
C31 bootloader program. Locations 1000h through O7FFFFFh and locations 
80A000h through OFFFFFFh are also accessed using STRB. 


Figure 2-4. TMS320C31 Memory Maps 
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2.4 Internal Bus Operation 
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A large portion of the TMS320C31’s high performance is due to internal busing 
and parallelism. The separate program buses (PADDR and PDATA), data 
buses (DADDR1, DADDR2, and DDATA), and DMA buses (DMAADDR and 
DMADATA) allow for parallel program fetches, data accesses, and DMA ac- 
cesses. These buses connect all of the physical spaces (on-chip memory, off- 
chip memory, and on-chip peripherals) supported by the TMS320C31. 
Figure 2-3 shows these internal buses and their connection to on-chip and off- 
chip memory blocks. 


The program counter (PC) is connected to the 24-bit program address bus 
(PADDR). The instruction register (IR) is connected to the 32-bit program data 
bus (PDATA). These buses can fetch a single instruction word every machine 
cycle. 


The 24-bit data address buses (DADDR1 and DADDR2) and the 32-bit data 
data bus (DDATA) support two data memory accesses every machine cycle. 
The DDATA bus carries data to the CPU over the CPU1 and CPU2 buses. The 
CPU1 and CPU2 buses can carry two data memory operands to the multiplier, 
ALU, and register file every machine cycle. Also internal to the CPU are regis- 
ter buses REG1 and REG2 that can carry two data values from the register file 
to the multiplier and ALU every machine cycle. Figure 2—2 shows the buses 
internal to the CPU section of the processor. 


The DMA controller is supported with a 24-bit address bus (DMAADDR) and 
a 32-bit data bus (DMADATA). These buses allow the DMA to perform memory 
accesses in parallel with the memory accesses occurring from the data and 
program buses. 


On-Chip Peripherals 


2.5 On-Chip Peripherals 


All TMS320C31 peripherals are controlled through memory-mapped registers 
on a dedicated peripheral bus. The peripheral bus is composed of a 32-bit data 
bus and a 24-bit address bus. The peripheral bus permits straightforward com- 
munication to the peripherals. The TMS320C31 peripherals include two timers 
and one serial port. Figure 2-5 shows the peripherals with associated buses 
and signals. 


Figure 2-5. Peripheral Modules 
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2.5.1. Timers 


2.5.2 Serial Port 
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The two timer modules are general-purpose 32-bit timer/event counters with 
two signaling modes and internal or external clocking. Each timer has an I/O 
pin that can be used as an input clock to the timer or as an output signal driven 
by the timer. The pin may also be configured as a general-purpose I/O pin. 


The TMS320C31 offers a full-duplex, synchronous serial port, which can be 
used as a general system interface or glueless logic connection to an external 
analog converter. The serial port can be configured to transfer 8, 16, 24, or 32 
bits of data per word. The clock for each serial port can originate either internal- 
ly or externally. An internally generated divide-down clock is provided. The se- 
rial port can also be configured as timers or bit I/O pins. A special handshake 
mode allows the TMS320C31s to communicate via their serial ports with auto- 
matic synchronization. 


Direct Memory Access (DMA) 


2.6 Direct Memory Access (DMA) 


The on-chip DMA controller can read from or write to any location in the 
memory map without interfering with the operation of the CPU. The DMA con- 
troller can be configured to synchronize transfers with external, serial port or 
timer interrupts. Therefore, the TMS320C31 can interface to slow memories 
and to on-chip and system peripherals without reducing throughput to the 
CPU. The DMA controller contains its own address generators, source and 
destination registers, and transfer counter. Dedicated on-chip DMA address 
and data buses minimize conflicts between the CPU and the DMA controller 
for on-chip resources. A DMA operation consists of a block or single-word 
transfer to or from memory. Figure 2-6 shows the DMA controller with 
associated buses. 


Figure 2-6. DMA Controller 
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2.7 External Bus Operation 


The TMS320C31 primary bus is the external memory interface. The primary 
bus consists of a 24-bit address bus, 32-bit data bus, and a set of control sig- 
nals. It can be used to address external program/data memory or I/O space. 
The bus has an external ready signal (RDY), which can be used in conjunction 
with the on-chip software for controlled wait-state generation. See Table 2-9 
for a description of the TMS320C31 external signals. 


2.7.1. External Bus Control Features 


The TMS320C31 external bus provides flexibility to implement different types 
of memory systems. The STRB control signal remains active between consec- 
utive read cycles to the same bank of memory, allowing high-speed SRAM and 
static-column decode accesses. In addition, the primary bus has a program- 
mable bank switching feature, providing more time for address decoding and 
memory turn-off, when a bank boundary is crossed. 


2.7.2 Multiprocessor Support 
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The TMS320C31 supports shared-memory multiprocessor systems through 
its HOLD and hold acknowledge (HOLDA) signals. When the HOLD input is 
asserted, the primary bus control, address and data bus signals go into a high- 
impedance state after the current bus cycle is complete. The HOLDA output 
acknowledges that the C31 primary bus has gone into high-impedance state. 


Interlocked operations ease the implementation of multiprocessor operations 
such as busy-wait loops, shared counter manipulation, and semaphores. The 
TMS320C31 supports interlocked operations through its XFO and XF1 pins 
and dedicated interlocked operation instructions. XFO and XF1 can also be 
used as bit I/O signals. 


2.8 


Interrupts 


Interrupts 


The TMS320C31 supports four external interrupts (INT3—-INTO), a number of 
internal peripheral interrupts, 28 software interrupts (traps), and a nonmask- 
able external RESET signal. The external and internal peripheral interrupts 
can be used to interrupt either the DMA or the CPU. When the CPU responds 
to the interrupt, the [ACK pin can be used to signal an external interrupt ac- 
knowledge. Typical interrupt latency times are less than 1 us for a 50-ns 
TMS320C31. 
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2.9 TMS320C31 Signal Descriptions 


Table 2-9 describes the external signals of the TMS320C31. They are listed 
according to the signal name; the number of pins allocated; the input (I), output 
(O), or high-impedance state (Z) operating modes; a brief description of the 
signal’s function; and the condition that places an output pin in high imped- 
ance. Aline overa signalname (for example, RESET) indicates that the sig- 
nal is active low (true at a logic 0 level). 


Table 2-9. TMS320C31 Signal Descriptions 


VO/2Zt Description Condition When 
Signal Is in High Z+ 


Primary Bus Interface (61 Pins) 


Read/write signal. This pin is high when a read is per- 
formed; low when a write is performed over the parallel 
interface. 


i a a External access strobe. |S oH | 


Ready signal. This pin indicates that the external de- 
vice is prepared for a transaction completion. 


HOLD 1 Hold signal. When HOLD is a logic low, any ongoing 
transaction is_completed. The A23-A0, D31-DO, 
STRB, and R/W signals are placed in a high-impe- 
dance state, and all transactions over the primary bus 
interface are held until HOLD becomes a logic high, or 
the NOHOLD bit of the primary bus control register is 
set. 

HOLDA Hold acknowledge signal. This signal is generated in 
response to a logic low on HOLD. It signals that A23— 
AO, D31—D0, STRB, and R/W are placed in a high-im- 
pedance state and that all transactions over the bus will 
be held. HOLDA will be high in response to a logic high 
of HOLD, or the NOHOLD bit of the primary bus control 
register is set. 


Input (I), output (O), high-impedance (Z)state. 
S=SHz active, H = Hold active, R = fee active. 
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Table 2-9. TMS320C31 Signal Descriptions (Continued) 


V/O/Zt Description Condition When 
Signal Is in High Z+ 


Control Signals (10 Pins) 


RESET 1 Reset. When this pin is a logic low, the device is placed 
in the reset condition. When reset becomes a logic 1, 
execution begins from the location specified by the re- 
set vector. 

IACK 1 O/Z Interrupt acknowledge signal. [ACK is active during the Ss 
IACK instruction. This can be used to indicate the be- 
ginning or end of an interrupt service routine. 


MCBL/MP Ea aa Microcomputer boot loader/microprocessor mode pin. | 


SHZ 1 Shut down high Z. An active low shuts down the 
TMS320C31 and places all pins in a high-impedance 
state. This signal is used for board-level testing to en- 
sure that no dual drive conditions occur. CAUTION: An 
active low on the SHZ pin corrupts TMS320C31 
memory and register contents. Reset the device with 
an SHZ = 1 to restore it to a known operating condition. 


XF1, XFO 2 VO/Z External flag pins. They are used as general-purpose 
/O pins or to support interlocked processor instruc- 
tions. 
Serial Port 0 Signals (6 Pins) 
CLKRO 1 VO/Z Serial port 0 receive clock. This pin serves as the serial 
shift clock for the serial port 0 receiver. 
CLKXO 1 VO/Z Serial port 0 transmit clock. This pin serves as the serial 
shift clock for the serial port 0 transmitter. 
1 VO/Z Data receive. Serial port 0 receives serial data via the 
DRO pin. 
DXO 1 VO/Z Data transmit output. Serial port 0 transmits serial data 
on this pin. 
FSRO 1 VO/Z Frame sychronization pulse for receive. The FSRO Ss 
pulse initiates the receive data process over DRO. 
FSX0 1 VO/Z Frame synchronization pulse for transmit. The FSXO Ss R 
pulse initiates the transmit data process over pin DX0. 
(Z) 


T Input (I), output (O), high-impedance state (Z). 
S = SHZ active, H = Hold active, R = Reset active. 
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Table 2-9. TMS320C31 Signal Descriptions (Concluded) 


VO/2Zt Description Condition When 
Signal Is in High Z# 


Timer Signals (2 Pins) 


Timer clock 0. As an input, TCLKO is used by timer 0 to 
count external pulses. As an output pin, TCLKO outputs 
pulses generated by timer 0. 


Timer clock 1. As an input, TCLKO is used by timer 1 to 
count external pulses. As an output pin, TCLK1 outputs 
pulses generated by timer 1. 


External H1 clock. This clock has a period equal to 
twice CLKIN. 
External H3 clock. This clock has a period equal to 
twice CLKIN. 
20 +5-Vpc supply pins. All pins must be connected to a 
common supply plane. § 
Vss 25 Ground pins. All ground pins must be connected to a 
common ground plane. 
X1 1 O/Z Output pin from the internal crystal oscillator. If a crystal Ss 
is not used, this pin should be left unconnected. 


X2/CLKIN a a The internal oscillator input pin from a crystal or aclock. PF 


Reserved (4 Pins) 1 


EMU2 — EMUO Enh as Reserved. Use 20-kQ pull-up resistors to +5 volts. ie | 
EMU3 


Input (1), output (O), high-impedance state (Z). 

S = SHZ active, H = Hold active, R = Reset active. 

Recommended decoupling capacitor value is 0.1 LF. 

Follow the connections specified for the reserved pins. 18- to 22-kQ pull-up resistors are recommended. All +5 volt supply 
pins must be connected to a common supply plane, and all ground pins must be connected to a common ground plane. 


H1 
H3 


=] +++ 
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Chapter 3 


TMS320C31 Features/Performance 
Comparison 


This chapter compares the device features and performance of the 
TMS320C31 to other embedded controllers. The TMS320C31’s CPU pro- 
vides higher system and numeric performance than CISC microprocessors 
and microcontrollers and also provides higher sustained numeric performance 
than RISC embedded controllers. The TMS320C31 also incorporates several 
peripherals on-chip, which helps reduce system cost and complexity. It also 
possesses a significant amount of on-chip memory, which facilitates the real- 
time execution of time-critical routines, reducing the need for expensive, high- 
speed external memory. 


The topics discussed include: 


Topic Page 


3.1. TMS320C31 Feature Comparison Versus Other Embedded 
GOmtrollens) o.sceisseceeicus seis ccereleyeonis euslets sya teeteleyeieieis steielaisiareie eusis sistsieiare 3-2 


3.2 TMS320C31 Benchmark Performance Versus Other Embedded 
GOMtrOllersS site cepa a vie vie erie ae ses tein vied aie aia weve meas epareqe wien 3-4 
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TMS320C31 Feature Comparison Versus Other Embedded Controllers 


3.1 TMS320C31 Feature Comparison Versus Other Embedded Controllers 


Table 3-1 lists and describes the fields shown in Table 3-2. Table 3—2 high- 
lights the features and performance of several embedded controllers in the 
same price range, including the TMS320C31. 


Table 3—1. Description of the Fields in Table 3-2 


r 


t 


Multiply Time (ns) Integ/Float The time the processor takes to perform a single, nonpipelined integer 
multiply/ floating point multiply. 
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TMS320C31 Feature Comparison Versus Other Embedded Controllers 


Table 3-2. Feature/Performance Comparison of Embedded Controllers 


Device—MHz ; On-Chip | Peripherals Multiply 
BAM Serial Timer | DMA Chan | ime (ns) 
(Bytes) Ports Integ/Float 


| 

es 

CS NL AC ed 
C0 RL ed 
ER AC 


Key: 
NA — The device does not support this feature in hardware. 
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TMS320C31 Benchmark Performance Versus Other Embedded Controllers 


3.2 TMS320C31 Benchmark Performance Versus Other Embedded 
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Controllers 


The best method to evaluate a processor’s performance in a given application 
is to benchmark the execution time of the applications software under target 
system constraints. The next best evaluation method is to benchmark the per- 
formance of similar code or code that is representative of the target applica- 
tion. However, due to short product development cycles, the processor evalu- 
ation period is rarely long enough to do the code development and system 
emulation necessary to perform such a rigorous performance analysis for 
each candidate device. Consequently, many system designers use published 
device benchmarks to obtain rough performance estimates for different 
classes of algorithms. 


Table 3-3 shows the published manufacturer benchmarks for several C lan- 
guage programs. These benchmarks have been used by processor manufac- 
turers to highlight the general performance of their devices and are a subset 
of a group of benchmarks referred to as the “Intel Intro Benchmarks”. Even 
though these benchmarks do not necessarily reflect controller performance for 
many realtime applications, the results are presented here to illustrate that 
high system-control performance can be achieved with the TMS320C31 using 
high-level language code. “Intel Intro Benchmarks” results for embedded pro- 
cessors at the same price level as the TMS320C31 are also shown in 
Table 3-3 to show that the TMS320C31 is a low-cost, high-performance solu- 
tion relative to other embedded controllers. 
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Table 3-3. Benchmark Comparison of the TMS320C31 With Embedded Controllers at the 
Same Price Level 


Benchmark (Units) ’C3x(1) AMD29000 i960KA(3) 
60 ns 60 ns (2) 40 ns 
(YARC Board) 


Dhrystones/(sec *32,237 *24,388 *23,423 


) 
5 ) 


Notes: 1) The’C31 benchmarks were run on the Texas Instruments ’C3x application board using zero wait-state SRAM. The 
C code was compiled using the TMS320 Floating-Point DSP Optimizing C compiler. The benchmarks yield the same 
results for both the ’C30 and ’C31. 


2) AMD29000 results are taken from an AMD application note, Intel i{960CA Benchmark Report Critique by Tim Olson. 
3) The i960KA and 68030 numbers are from the February 1990 issue of Electronic Engineering. 


4) Anasterisk (*) denotes compiler in-lining of application functions. Without using in-lining, the TMS320C31 provides 
24,876 Dhrystones/sec. 


3.2.1. Dhrystone Benchmark 


The Dhrystone benchmark was originally used to measure device perfor- 
mance and compiler efficiency in typical host CPU integer applications. It does 
not include input/output or operating system operations. In Table 3-3, the re- 
sults for Dhrystone version 1.1 are shown due to the widespread availability 
of processor benchmark results for version 1.1 over later versions of the 
benchmark. 


3.2.2 Bubble- and Quick-Sort Benchmarks 


The bubble-sort program performs a bubble sort on an array of elements, and 
the quick-sort program uses the quick-sorting algorithm to sort an array of ele- 
ments. 


3.2.3. matmult Benchmark 


matmult is a routine that multiplies two 7x7 matrices together. The 7x7 ma- 
trices are subsets of 8x8 matrices. 
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3.2.4 anneal Benchmark 


anneal solves the travelling salesman’s problem—given anumber of cities that 
the salesman wants to visit, find the shortest route to visit all of the cities by 
visiting each city only once. The problem is solved using simulated annealing 
techniques. 


3.2.5 Benchmark Summary 
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For the system control benchmarks described above, the TMS320C31 per- 
forms at the same level as higher priced devices and overall, outperforms de- 
vices at the same price level. For the matmult benchmark, the TMS320C31 
offers superior results due to its single-cycle multiply support on-chip. These 
benchmarks focus on CPU performance and do not reflect that the 
TMS320C31 possesses more on-chip peripherals than the other processors 
shown. On-chip peripheral integration reduces system cost and complexity 
and is an important consideration in embedded controller selection. 


Chapter 4 


Application Examples 


This chapter presents four application examples that show how the 
TMS320C30 and TMS320C31 have been used to integrate system control 
and signal-processing functions in several application areas. In two of the ex- 
amples, SPOX, a realtime embedded operating system from Spectron Micro- 
Systems, is used to facilitate the integration. For more information on SPOX, 
refer to Chapter 6. The examples discussed are as follows: 


Topic Page 


4.1. Telecommunications Example Using SPOX .................+++: 4-2 
4.2 Instrumentation Application and Processor Evaluation Example .. 4-5 
4.3. Test Equipment Example Using SPOX ...........:000eeeeeeeeeee 4-9 
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4.1 Telecommunications Example Using SPOX 


4.1.1 


Speech Recognition With TMS320C31 and SPOX 


Voice Processing Corp. (VPC) of Cambridge, Massachusetts, a leader in 
speech recognition technology, develops and markets proprietary technology 
for speaker-independent continuous and discrete word recognition. VPC has 
taken an approach to speech recognition that is particularly adept for handling 
voices over the telephone. Telephone transactions is one area in which 
speech recognition technology has a compelling market need. 


VPC has been supplying speech recognition technology to telecom system 
manufacturers and over-the-phone service providers for several years, allow- 
ing these firms to replace human operators. VPC recognizers are being used 
in a wide array of applications, such as credit card verification, operator inter- 
cept, telephone order entry, and voice-mail. 


4.1.2 Lower Cost and More Recognizers 
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The VPC recognition software requires a high-performance platform that can 
execute both signal processing and general-purpose algorithms. Since such 
hardware platforms did not exist on the market, in 1989 VPC developed and 
built an ISA board with two different processors: the Intel i886 microprocessor 
and Texas Instruments TMS320C25 signal processor. All of the cycles of the 
ISA board were needed to execute one speaker-independent speech recog- 
nizer in realtime. Since 1989, as their customers required more and more lines 
of speech recognition to automate over-the-phone services, VPC needed a 
new hardware platform that could provide more lines of recognizers at a lower 
cost per line. VPC also needed a more powerful hardware platform to run new 
recognition algorithms being developed in their research lab. 


As VPC engineers saw it, there were two ways to reduce the cost of the speech 
recognition hardware. They could go to faster hardware that would execute 
multiple recognizers per chip or they could pack more recognizers onto a 
single ISA board so they could amortize the board and system cost over more 
recognizers. They also wanted this new platform to give them more power and 
flexibility to handle new algorithms. Some of their customers wanted to port 
different voice functions, such as speech synthesis, to the VPC hardware plat- 
form. To ensure that the hardware platform could be easily reprogrammed, 
VPC wanted to replace their heterogeneous architecture (viz i386 and ’C25) 
with ahomogeneous multiprocessing architecture, which makes it much easi- 
er to partition functions across processors. Since the new processor had to 
take on the functions of both the 386 and ’C25, the support of a multitasking 
operating system was important. 
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The VPC criteria for selecting the processor for their next generation platform 
were as follows: 


1) The cost of hardware per recognizer. 


2) Thenumber of microprocessors (viz. recognizers) they can incorporate on 
a board. 


3) Ccompiler and operating system support for pre-emptive multitasking and 
multiprocessing. 


4.1.3 VPRO-4: A Homogeneous Multi-DSP Architecture 


The new platform VPC developed, called the VPRO-4, is an ISA board with 
four 'C31s and a shared-memory architecture. Each C31 has 512K bytes of 
zero-wait-state local memory, and there is 1-8 megabytes of multiported 
shared memory on the board. All four’C31s and the PC hostcan read and write 
into this shared memory. A robust set of tokens, semaphores, and interrupts 
facilitates interprocessor communications via software-defined memory struc- 
tures. Communications with the PC are streamlined by a PC bus I/O-mapped 
control port which provides for unintrusive polling operations. Realtime voice 
I/O to a standard voice bus (Dialogic PEB or Natural Microsystems MVIP) is 
done over the serial port of the ‘C31 via an ASIC interface chip. 


Figure 4-1. VPRO-4 Hardware Architecture 
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4.1.4 From Tiger 30 to Realtime Recognition 


VPC developed software with the Tiger 30 development board from DSP Re- 
search. Two discrete word recognizers could run on a single ’C31—eight rec- 
ognizers on a single ISA board. They also used the board to experiment with 
SPOX to help them understand its capabilities and performance better. 


It took about six months to build the VPRO-4 hardware prototype using the 
Tiger 30 and SPOX. Because the Tiger 30 board did not interface to the voice 
bus, they tested their recognizer with canned voice data stored on the host file 
system. After the VPRO-4 hardware and the necessary low-level software for 
loading and interfacing to the board was completed, it took just one day to 
move the SPOX realtime kernel and the recognizer software over to the 
VPRO-4 hardware. 


Each ’C31 on the VPRO-4 runs several tasks using the preemptive multitask- 
ing capability of SPOX. A high-priority task moves time-critical voice data to 
and from the voice bus. The bulk of the ’C31 cycles, however, are used for 
speech recognition—it runs one recognition task for continuous word input or 
two recognition tasks for discrete word input. There are also background tasks 
for communicating with the host and other housekeeping functions. 


4.1.5 A New Level of Interoperability 
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VPC’s ’C31-based platform gives them a higher performance system and it 
lets them serve their customers better. Research continues at VPC to improve 
the recognition algorithms and take advantage of the processing power of the 
VPRO-4. In some customer applications, speech recognition has to be com- 
plemented with other voice functions, such as speech synthesis. The VPRO-4 
makes it easy to port third-party voice algorithms to the DSP platform, signifi- 
cantly reducing total system costs by removing the need for multiple hardware 
platforms. Other VPC customers have their own ’C31/SPOX hardware. The 
commonality in the system environment makes it much easier for VPC to port 
their recognition software to the customer’s hardware. This level of interoper- 
ability is a significant milestone for speech recognition and signal processing 
technology. Over-the-phone service providers can now quickly incorporate 
new voice technology on either VPC’s hardware or their own hardware to suit 
different applications. 
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4.2 Instrumentation Application and Processor Evaluation Example 


4.2.1_ Background and System Description 


Nicolet Instruments developed the first digital oscilloscope 20 years ago. They 
have since developed and marketed a variety of other data-acquisition prod- 
ucts based on the concept of digitizing analog waveforms. Although they de- 
sign 8-bit digitizers that collect data at rates of up to 200 million samples/se- 
cond, their product strength is in the higher-precision, lower-speed digitizers 
(10 to 16 bits wide, 1-50 million samples/second) with very long memories 
(greater than 1 million samples). Nicolet’s requirements for an embedded pro- 
cessor were low system cost, and high data movement and numeric proces- 
sing performance. 


Figure 4—2 is a block diagram of a typical Nicolet high-precision data-acquisi- 
tion system using a dual-processor architecture. The master CPU controls the 
data-acquisition subsystem, which includes the analog converters, digitizer 
memory and arbitration logic. In the current implementation of this architec- 
ture, a CISC processor is used as the master CPU. The slave processor han- 
dles high-speed data transfers within, in, and out of the system and performs 
numeric operations on the digitized data. To perform these operations effi- 
ciently, Nicolet wanted a slave processor that would allow low-system cost, 
and high-data movement and numeric-processing performance using the C 
language. Nicolet selected the ’C31 due to its balance of price and perfor- 
mance over RISC solutions. In addition, for extremely cost-sensitive designs, 
Nicolet is considering the ’C31 to integrate the functionality of both the master 
and slave processors. 
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Figure 4-2. System Diagram 
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For Nicolet’s data-acquisition equipment, the processor must move data and 
complete calculations in realtime, and also have enough performance to dis- 
play the information in a reasonable amount of time. To fulfill these require- 
ments, the slave processor needed the following characteristics: 


_] High data-movement rate 
Fast address-generation capability 


Realtime calculation of waveform pulse parameters 


UU wv 


Floating-point Fast-Fourier transformation (FFT) of input samples to en- 
able the frequency domain display of the data 


[|_| Performance of other realtime DSP operations including filtering, correla- 
tion, and convolution 


The importance of these device characteristics is illustrated in some of the al- 
gorithms Nicolet uses in its data acquisition equipment—archive shuffle, wa- 
veform processing and FFT. 
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4.2.2. Archive Shuffle 


When Nicolet’s equipment digitizes a waveform, the trigger or start point is not 
necessarily at the first location in digitizer memory. The archive shuffle algo- 
rithm moves the trigger point to the first location without using additional data 
memory (in-place data movement). Even though the archive shuffle algorithm 
did not take advantage of a DMA controller, the ’C31 is efficient at performing 
the data shuffle due to its single cycle instructions and auxiliary register arith- 
metic units, which can generate two pointer addresses every instruction cycle. 
Nicolet modified the algorithm to use the ’'C31’s on-chip DMA to move blocks 
of data in and out of the ’C31, in parallel with the ‘C31 CPU calculating the 
source and destination addresses of subsequent blocks. With the use of the 
DMA, Nicolet estimated that the time required to shuffle a block of data was 
reduced to 35% of the time required for the non-DMA implementation. 


4.2.3 Waveform Processing 


Waveform processing involves calculating waveform parameters such as 
area, rise time, root-mean-square (RMS), and standard deviation. The wave- 
form processing must be performed on 1K samples fast enough to allow 5-10 
user-screen updates/second. The ’C31 provided more than enough perfor- 
mance to meet the screen update requirements. With its single-cycle multiply 
capability, the ’C31 especially excels in operations that require multiplies in the 
inner loop. In addition to the on-chip hardware math support, the C31 per- 
forms the waveform calculations quickly due to its 2K words of on-chip, gener- 
al-purpose memory and on-chip program cache. 


4.2.4 Fast Fourier Transform 


The requirements for the floating-point FFT are similar to those for waveform 
processing. The processor must perform a 1K FFT fast enough to allow 10 
screen updates/second. The ’C31 FFT performance far exceeded the user- 
update requirement. And if greater FFT performance was needed, Nicolet ob- 
served that they could use the C-callable, hand-optimized assembly-language 
FFT routines available from Texas Instruments. This is not an option with many 
RISC processors. 
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4.2.5 Advantages of a TMS320C31 System 
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Nicolet explained their choice of a’C31 as the embedded processor with the 
following comments: 


1) 


2) 


The 'C31 offers a good balance of data movement and numeric perfor- 
mance for the price. 


The ’C31’s performance is on par with more expensive processors, mak- 
ing many of the extra-cost product options either no-cost options or extra- 
margin options. 


The ’C31 is very efficient at accessing arrays of data due to its ability to 
do auto-increment indirect addressing. 


The majority of their code consists of small loops, which makes good use 
of the ’C31’s on-chip instruction cache. 


The ’C31 allows the user to implement algorithms using either floating- 
point or integer math, while achieving the same performance with either 
data format. 


C callable, optimized DSP algorithms are available for the ’C31. 


Code development is not required to build a software monitor for the C31. 
A target monitor plugs directly into the target system’s ’C31. 


The ’C31 has a clear family road-map for higher performance with the 
availability of the ‘C3x and ’C4x generations of TMS320s. 
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4.3 Test Equipment Example Using SPOX 


Developed by Doble Engineering in the 1930s, the Doble test is run routinely 
by power utility companies to test insulation material used in power substa- 
tions. Over time, the electrical insulation material can break down and can lead 
to severe damage to the substation and interruptions to service if the problems 
go undetected. The insulation test procedure involves applying an alternating 
voltage across the material specimen and a reference sample. The electrical 
current, capacitance, dielectric-loss, and power factor across the test speci- 
mens are measured and analyzed in realtime. To make the test procedure 
practical, Doble has designed their equipment to be quick and easy to operate 
and able to make accurate measurements in the presence of a high level of 
electrical interference. 


Figure 4-3. Doble Test Set-Up 


Ic IT 
v" | 
Ic 
boo | 
E “—T ay a Test Q | 
| L—@——! Specimen | 
(A) 
o) E 
Current IR 
and Power factor = cos 8 
pose Ncte E = Test Voltage 
Cp = Equiv Parallel Capacitor 
Rp = Equiv Parallel Resistor 


Doble Engineering upgraded their M-series test system from an all-analog de- 
sign to an all-digital design to reduce production cost, provide portability, in- 
crease accuracy, and provide expert advice to the operator. Elegantly simple, 
the new system consists of an IBM-compatible PC-AT with an attached DSP. 
The DSP replaces the analog signal processing hardware and executes pro- 
prietary signal processing algorithms which produce more accurate measure- 
ments. The PC host serves as an expert system and provides a graphical user 
interface (GUI) complete with dials and meters for operator ease. 
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Figure 4-4. The New Doble M Series System 
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4.3.1 TMS320C30 and SPOX—Merging DSP and Control 
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By building an experimental DSP-based system using a fixed-point DSP, 
Doble began the transition from analog to digital technology. Because the DSP 
lacked many general-purpose functions and was difficult to program, they 
used it as a black box to replace the analog circuitry that performed filtering 
and modulation. The DSP code was kept short and was written entirely in as- 
sembly language. Realtime I/O and instrument control were performed with an 
existing attached microprocessor board (with an Intel 80186) running a com- 
mercial realtime operating system. Problems with this black-box approach in- 
dicated that what Doble needed was a more programmable DSP platform that 
could handle both signal processing and realtime instrument control. 


When Doble went from the experimental system to a production system, their 
engineers evaluated six DSPs. The TMS320C30 offered a general-purpose 
architecture that could perform both realtime control and signal-processing 
functions. The floating-point arithmetic capability made data analysis easier 
because it guaranteed sufficient accuracy in the analysis algorithms over a 
wide dynamic range. Doble engineers also evaluated C compilers for the C30 
and other DSPs—the ’C30 C compiler clearly generated better code. When 
they learned of the Spectron Microsystems SPOX operating system, they 
were ready to revise the architecture of the system: the realtime I/O and instru- 
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ment-control functions of the 80186 and the traditional signal-processing func- 
tions of the fixed-point DSP would be performed by the TMS320C30. Using 
SPOX would also allow Doble to use an object-oriented approach to all of their 
software development and help them make their code maintainable and easy 
to modify. 


4.3.2 From Proof-of-Concept to the Final Product 


To validate this new architecture, Doble purchased the Sonitech Spirit 30 de- 
velopment board for the PC. Because SPOX had already been ported to the 
Sonitech board, Doble completed a prototype of the new system in two 
months. Doble then ported SPOX to their customer’s C30 platform using the 
SPOX-OS component product. This effort involved reconfiguring SPOX and 
writing a few device drivers for data I/O and host I/O. Because the two hard- 
ware platforms had the same SPOX system software, almost all of the proto- 
type code was reused in the product. 


While all of Doble’s DSP code had been written in assembly language, the 
TMS320C30 was programmed in C using the SPOX realtime kernel and math 
library. Because the SPOX math library had been coded by Spectron in as- 
sembly language, the signal-processing algorithms ran efficiently, using only 
about 50% of the ’C30 cycles. This left enough cycles to perform realtime con- 
trol functions and new signal processing algorithms. Because the resultant 
DSP software architecture was more modular, new functions could be added 
or changed easily. The multitasking capability of SPOX allowed math functions 
to run concurrently as the DSP acquired data in realtime and communicated 
with the PC host. Because of the flexibility of the DSP platform, Doble planned 
to provide different services and products to their customers using the same 
platform. 
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Chapter 5 


Development Support 


Throughout the design of the TMS320C3x DSPs, hardware and software engi- 
neers worked with device architects to create a processor ideally suited to 
today’s development tool technologies. The result is a full set of hardware and 
software tools. From the friendly Programmers Interface to Tl’s unique scan- 
based emulator, the development environment makes the design of em- 
bedded systems fast and easy. 


This chapter provides an overview of the development support products sup- 
porting TMS320C3x design. 


— — — — — — — — — — — — — — — — — — — — — — — — — — — —_—Ma a a oer 


Note: 


A floating-point compiler, assembler, and linker support the TMS320C31, 
TMS320C30, TMS320C40, and all future spin-offs of the ’C3x and ’C4x gen- 
erations. Complete support for all 32-bit TMS320 processors provides an ef- 
ficient upgrade path without requiring the purchase of additional compilers, 
assemblers, or linkers. Throughout this chapter, this compiler will be referred 
to as the TMS320C31 compiler; the assembler/linker will be referred to as 
the TMS320C31 assembler/linker. 


Topic Page 
5.1. TMS320 Optimizing ANSI C Compilers .................2.0000055 5-2 
5.2 TMS320 Programmer’s Interface (C/Assembly Source Debugger) 5-15 
5.3 TMS320C31 Assembly Language Tools ...........+.+++++000055 5-19 
5.4 TMS320 Software Simulators ............ cece cece eee eee eee 5-21 
5.5 TMS320C3x Evaluation Module .............::eceeeeee ee ee eee 5-24 
‘5'6aee I MS320CG3xcEmulatone- eee eee ene ener erie eer 5-26 
5.8 HP 64776 Analysis Subsystem ............0ccee cece eee eee eee 5-31 
5 ORM Ss20 echnical SuppOktermer ere aes eect itr eerie: 5-33 


5-1 


TMS320C3x Optimizing ANSI C Compilers 


5.1 TMS320C3x Optimizing ANSI C Compilers 
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Fast code development and code maintenance over the life of a product are 
concerns that all developers share. Tl supports embedded system developers 
with an optimizing compiler for the TMS320C31, which translates ANSI-stan- 
dard, C language files into highly efficient TMS320C31 assembly language 
source files, which are then input toa TMS320C31 assembler/linker. The com- 
piler has been validated for conformance to the ANSI C specification, using the 
industry-standard, Plum-Hall test suite. 


The TMS320C31 compiler is complemented by the standard TMS320 Pro- 
grammers Interface for debugging C and assembly source code. The C com- 
piler produces a rich set of debugging information, which is used by the debug- 
ger, allowing source-level debugging in C. This enhances productivity and 
shortens the development cycle for embedded system designers. 


Key features include: 


[_] Complete and exact conformance with the ANSI C specification. 


_] Highly efficient code. The compiler incorporates state-of-the-art generic 
and target-specific optimizations (described in detail within the succeed- 
ing subsections). The TMS320C31 compiler performs both global opti- 
mizations and loop optimizations such as strength reduction. Additionally, 
it thoroughly analyzes code in order to optimize the usage of memory and 
register variables. 


[_.} ANSI-standard runtime-support library. 


ROM-able, relocatable, and re-entrant code. 


Lu 


[__| The ability to link C programs with assembly language routines, allowing 
hand coding of time-critical functions in assembly language. 


[_] A full-featured, flexible linker that allows total control over memory alloca- 
tion, memory configuration, and partial linking and contains features that 
allow easy runtime relocation of code. 


[| AC shell program that facilitates one-step translation from C source to 
executable code. 


[_] Fast compilation to increase productivity. 


[_] Unlimited symbol table space (up to the amount of available host 
memory). 


[_] Complete and useful diagnostics (error messages). 


TMS320C3x Optimizing ANSI C Compilers 


(_] An archiver utility that allows you to collect files into a single archive file 
or library by adding new files or by extracting, deleting, or replacing files. 
You can use a library of object files as input to the linker. 


Lu 


[_] Ability to expand in-line both runtime-support and user-defined functions. 
A utility that builds object libraries from source libraries. 


[_] A variety of listing files, including: 


Assembly-source file, which can optionally include interlisted, C- 
source code as well as register-usage information. 


Preprocessed output file useful for separating preprocessing/parsing 
(if memory limitations dictate) and for troubleshooting macro defini- 
tions. 


Assembly-listing file with line numbers and opcodes. 


(_} Abig memory model with unlimited space for global data, static data, and 
constants. In the small (default) model, this space is limited to 64K words 
for faster, more efficient coding/execution. 


5.1.1 TMS320C31 Compiler Optimizations 


The efficiency of a C compiler depends upon the scope and number of op- 
timizations the C compiler performs, as well as upon the application. The 
TMS320C31 compiler performs a wide variety of optimizations to improve the 
efficiency of the compiled code. The following list and explanations that follow 
describe some of the optimizations and highlight particular strengths of the C 
compilers. 


_} General-Purpose C Optimizations 


Algebraic reordering, symbolic simplification, constant folding 
Alias disambiguation 
Data flow optimizations 


= Copy propagation 
= Common subexpression elimination 
= Redundant assignment elimination 


Branch optimizations/control-flow simplification 
Loop induction variable optimizations, strength reduction 


Loop rotation 
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HM Loop-invariant code motion 


In-line expansion of function calls 


_} Optimizations Specific to the TMS320C31 compiler 
M@ Register variables 

Register tracking/targeting 

Cost-based register allocation 

Autoincrement addressing modes 

Repeat blocks 

Delayed branches 

Use of registers for passing function arguments 


Parallel instructions 


Conditional instructions 


Loop unrolling 


5.1.1.1 General-Purpose Optimizations 


Algebraic Reordering, Symbolic Simplification, Constant Folding 


For optimal evaluation, the compiler simplifies expressions into equivalent 
forms requiring fewer instructions or registers. For example, the expression 
(a+ b) —(c¢ + d) requires more instructions and registers to evaluate than the 
equivalent expression ((a + b) — c)— d. Operations between constants are 
folded into single constants. For example, a= (b+ 4)-—(c+ 1) becomes 
a= b-c+ 3. See Figure 5-1. 


Alias Disambiguation 


Programs written in C generally use many pointer variables. Frequently, com- 
pilers are unable to determine whether or not two or more | (lower case L) val- 
ues (symbols, pointer references, or structure references) refer to the same 
memory location. This aliasing of memory locations often prevents the compil- 
er from retaining values in registers, because it cannot be sure that the register 
and memory continue to hold the same values over time. Alias disambiguation 
is a technique that determines when two pointer expressions cannot point to 
the same location, allowing the compiler to freely optimize such expressions. 


Data Flow Optimizations 


Collectively, the following three data flow optimizations replace expressions 
with less costly ones, detect and remove unnecessary assignments, and avoid 
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operations that produce values already computed. The compiler performs 
these data flow optimizations both locally (within basic blocks) and globally 
(across entire functions). See Figure 5—1 and Figure 5-2. 


(_} Copy Propagation 


Following an assignmentto a variable, the compiler replaces references to 
the variable with its value. The value could be another variable, a constant, 
or acommon subexpression. This may result in increased opportunities 
for constant folding, common subexpression elimination, or even total 
elimination of the variable. 


[_} Common Subexpression Elimination 


When the same value is produced by two or more expressions, the compil- 
er computes the value once, saves it, and reuses it. 


(_] Redundant Assignment Elimination 


Often, copy propagation and common subexpression elimination op- 
timizations result in unnecessary assignments to variables (variables with 
no subsequent reference before another assignment or before the end of 
the function). The compiler removes these dead assignments. 
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Figure 5—1. Data Flow Optimizations for TMS320C31 Compilers 


simp (int 
{ 
int 
int 
int 
int 


a 
— 
c 
d 
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TMS320C31 compiler output is: 
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is allocated to temp var 


2,R0 
RO,RC,R1 
R1,RC,RE 
37 RL 
R1,RC,RS 


; Rl = 
> b= 


ae 
'Ts2!’ 
‘TS1’ 


(j*a + 235) (33 + 23) == 
(43) (j << 2) 

(43 + 3) == 53 

load shift count 

c= 


(99) ==(4 5) 


RE, RC, R2 
RS,R2,R3 
R3 


; R2 
; R3 = (j << b) + 
; push R3 (d) 

; push c 
; push b 
; pusha 


(j << a) 


(tracked in R1) 


The constant 3, assigned to a, is copy-propagated into all uses of a. a becomes a dead variable and 
is removed completely. The sum of multiplying j by 3 (a) and 2 is simplified into a multiply by 5, which 
is computed with a shift and add. The expression (j << a) is computed once for assignment to c and 
then reused for calculating d. These optimizations are also performed across jumps. 


Branch Optimizations, Control-Flow Simplification 


The compiler analyzes the branching behavior of a program and rearranges 
the linear sequences of operations (basic blocks) to remove branches or re- 
dundant conditions. Unreachable code is deleted, branches to branches are 
bypassed, and conditional branches over unconditional branches are simpli- 
fied to a single conditional branch. When the value of a condition can be deter- 
mined at compile time (through copy propagation or other data flow analysis), 
a conditional branch can be deleted. Switch case lists are analyzed in the 
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same way as conditional branches and are sometimes eliminated entirely. 
Some simple, control-flow constructs can be reduced to conditional instruc- 
tions, totally eliminating the need for branches. See Figure 5-2. 


Loop Induction Variable Optimizations, Strength Reduction 


Loop induction variables are variables whose value within a loop is directly re- 
lated to the number of executions of the loop. Array indices and control vari- 
ables of FOR loops are very often induction variables. Strength reduction is 
the process of replacing costly expressions involving induction variables with 
more efficient expressions. For example, code that indexes into a sequence 
of array elements is replaced with code that increments a pointer through the 
array. Loops controlled by incrementing a counter are written as repeat blocks, 
or by using efficient decrement-and-branch instructions. Induction variable 
analysis and strength reduction together often remove all references to the 
programmer’s loop control variable, allowing it to be eliminated entirely. 


Loop Rotation 


The compiler evaluates loop conditionals at the bottom of loops, saving a cost- 
ly extra branch out of the loop. In many cases, the initial entry conditional check 
and the branch are optimized out. 


Loop-Invariant Code Motion 


This optimization identifies expressions within loops that always compute the 
same value. The computation is moved in front of the loop, and each occur- 
rence of the expression in the loop is replaced by a reference to the precom- 
puted value. 


In-Line Expansion of Function Calls 


The special keyword inline directs the compiler to replace calls to a function 
with in-line code, saving the overhead associated with a function call as well 
as providing increased opportunities to apply other optimizations. See 
Figure 5—2 and Figure 5-3. 
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Figure 5—2. Copy Propagation and Control-Flow Simplification for TMS320C31 Compilers 


fsm() 

{ 
enum { ALPHA, BETA, GAMMA, OMEGA } state 
int *input; 


while (state != OMEGA) 
switch (state) 


{ 


case ALPHA: state = *inputt++ ? 7 ; break; 
case BETA : state *inputt++ ? : ; break; 
case GAMMA: state = *inputt++ ? : ] ; break; 


allocated to user var ‘input’ 


RO ; initial state == ALPHA. 

if input == goto state BETA 

else goto state GAMMA 
state == ALPHA. 

if input != 0 goto state GAMMA 
state =- BETA. 

if input != 0 goto state ALPHA 
state == GAMMA. 

if input != 0 goto state OMEGA 
state == GAMMA. 

if input == goto state GAMMA 
state == OMEGA. 


The switch statement and the state variable from this simple finite-state machine process are 
optimized completely away, leaving a streamlined series of conditional branches. 
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Figure 5—3. In-Line Function Expansion for TMS320C31 Compilers 


inline blkcpy (char *to, char *from, int n) 
{ if (n> 03 
do *to+t+ = *fromt++; while (--n !=0); 


struct s { int a,b,c[10]; } s; 
initstr (struct s *ps, char t[12]) 
{ blkcpy((char *)ps, t, 12); 


} 
TMS320C31 compiler output is: 

initstr 
R2 assigned to variable ‘t’ 
AR2 assigned to variable /’/blkcpy_1_to’ 
AR4 assigned to variable '’blkcpy_1_from’ 
BK assigned to variable ‘ps’ 
RC assigned to variable ‘'LS$1’ 


BK, AR2 ;blkcpy_l_to = 
R2,AR4 ;blkcpy_1_from 
*AR4++,R0 7 
10 ;| expansion of blkcpy: 
RO, *AR2++ ;| copy 12 words 

*AR4++,R0 , 
RO, *AR2++ 


The special in-line declaration of bIlkcpy results in the call being replaced with the function’s 
body. The compiler creates temporary variables blkepy_1_to and blkepy_1_from, corre- 
sponding to the parameters of blkepy. Often, copy propagation can eliminate assignments 
to such variables when the argument expressions are not reused after the call. 
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5.1.1.2 Optimizations Specific to the TMS320C31 Compiler 


Register Variables 


The compiler helps maximize the use of registers for storing local variables, 
parameters, and temporary values. Variables stored in registers can be ac- 
cessed more efficiently than variables in memory. This optimization is particu- 
larly effective for pointers that arise when array index constructs are turned into 
loop induction variables. See Figure 5—4 and Figure 5-5. 


Figure 5—4. Register Variables and Register Tracking/Targeting 


int gvar; 

reg(int i, int j) 

{ 

call() & i; 


gvar 
j gvar + i; 


J 


} 
TMS320C31 compiler output is: 


reg: 


* 


*R4 is allocated to user var 
*R5 is allocated to user var 
* 


CALL _call 

AND R4,R0 

STI RO, @_gvar 

ADDI R4,RO0,R5 ;tracks gvar in RO, 
;targets result into R5 (3) 


The compiler allocates local variables i and j into registers R4 and R5, 
as indicated by the comments in the assembly listing. Allocating i to 
R4 and tracking gvar in RO allows the sum gvar + i to be computed 
with a 3-operand instruction, targeting the result directly into j in R5. 


Register Tracking/Targeting 


The compiler tracks the contents of registers so that it avoids reloading values 
if they are used again soon. Variables, constants, and structure references 
such as (a.b) are tracked through both straight-line code and forward 
branches. The compiler also uses register targeting to compute expressions 
directly into specific registers when required, as in the case of assigning to reg- 
ister variables or returning values from functions. See Figure 5—4. 
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Cosi-Based Register Allocation 


The compiler, when enabled, allocates registers to user variables and com- 
piles temporary values according to their type, use, and frequency. Variables 
used within loops are weighted to have priority over others, and those vari- 
ables whose uses don’t overlap may be allocated to the same register. Vari- 
ables with specific requirements are allocated into registers that can accom- 
modate them. 


Autoincrement Addressing Modes 


For pointer expressions of the form *p++, *p——, *#+#p, or *“——p, the compiler 
uses efficient TMS320C31 autoincrement addressing modes. In many cases, 
where code steps through an array in a loop, such as for (i=0; i<N; ++i) 
a[i]..., the loop optimizations convert the array’s references to indirect refer- 
ences through autoincremented register variable pointers. See Figure 5-5. 


Repeat Blocks 


The TMS320C31 compiler supports zero-overhead loops with the RPTS (re- 
peat single) and RPTB (repeat block) instructions. The compiler can detect 
loops controlled by counters and generate them by using the efficient repeat 
forms: RPTS for single-instruction loops, or RPTB for larger loops. For both 
forms, the iteration count can be either a constant or an expression. See 
Figure 5-3 and Figure 5—5. 


Induction variable elimination and loop test replacement allow the compiler to 
recognize the loop as a simple counting loop and then generate a repeat block. 
Strength reduction turns the array references into efficient pointer autoincre- 
ments. 
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Figure 5—5. Repeat Blocks, Autoincrement Addressing Modes, Parallel Instructions, 
Strength Reduction, Induction Variable Elimination, Register Variables, and 
Loop Test Replacement for Floating-Point Compilers 


float a[10], b[10]; 
scale(float k) 
{ 
int i; 
£or i O; i < 10; ++i) 
ali] = b[i] * k; 


TMS320C31 compiler output is: 


_scale: 


@CONST+0, AR4 ; AR4 = é&a[0] 
@CONST+1,AR5 ; ARS = &b[0] 

R4, *AR5++, RO ; compute first product 
8 ; loop for next 9 

RO, *AR4++ ; store this product... 
R4, *AR5++,RO ; ...and compute next 
RO, *AR4++ ; store last product 


This process shows general and floating-point-specific optimizations working together to generate 
highly efficient code. Induction variable elimination and loop test replacement allow the compiler to 
recognize the loop as a simple counting loop and then generate a repeat block. Strength reduction 
turns the array’s references into efficient pointer autoincrements. The compiler unrolls the loop once 
to separate the first multiply and last store, allowing the body of the loop to be written as a single 
parallel instruction. 


Delayed Instructions 


The TMS320C31 compiler supports delayed branch instructions that can be 
inserted three instructions early in an instruction stream, avoiding costly pipe- 
line flushes associated with normal branches. The compiler uses uncondition- 
al delayed branches wherever possible, and conditional delayed branches for 
counting loops. See Figure 5-6. 
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Figure 5-6. TMS320C31 Compiler Delayed Branch Optimizations 


wait (volatile int *p) 


{ 


} 


for(;;) 
if (*p & 0x80) *p |= OxFO; 


TMS320C31 compiler output is: 


_wait: 
L6: 


KKK 


; RO = *p (AR4 is allocated to p) 
; test *p & 0x80 

; false: loop back 

, true: loop back (delayed) 

ta 

v 

v 

- 


; branch occurs 


The unconditional branch at the bottom of this loop is written as a delayed branch, 
allowing it to execute in one machine cycle. 


Use of Registers for Passing Function Arguments 


The compiler supports a new, optional calling sequence that passes argu- 
ments to registers rather than pushing them onto the stack. This can result in 
significant improvement in performance, especially if calls are important in the 
application. See Figure 5—2. 


Parallel Instructions 


Several floating-point or integer instructions such as load/load, store/operate, 
and multiply/add can be paired with each other and executed in parallel. When 
adjacent instructions match the addressing requirements, the compiler com- 
bines them in parallel. Although the code generator performs this optimization, 
the optimizer greatly increases effectiveness because operands are more like- 
ly to be in registers. See Figure 5-3 and Figure 5-5. 


Conditional Instructions 


The load instructions in the ’C31 C compiler can be executed conditionally. For 
simple assignments such as a=condition ? expr1 :expr2 or if (condition) 
a = b, the compiler can use conditional loads to avoid costly branches. 
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Loop Unrolling 


When the compiler can determine that a short loop is executed a low, constant 
number of times, it replicates the body of the loop rather than generating the 
loop; note that low and short are subjective judgments made by the compiler. 
This avoids any branches or use of the repeat registers. See Figure 5-7. 


Figure 5—7. Loop Unrolling 


add3 (int a[3]) 
{ 
int i, sum = 0; 
for (i = 0; i < 3; ++i) sum 


return sum; 


} 
TMS320C31 compiler output is: 


_add3: 


LDI *—-FP (2),AR4 ; AR4 = é&a[0] 
LDI *AR4++,RC ; sum += a[0] 
ADDI *AR4++,RC ; sum += a[1] 
ADDI *AR4++,RC ; sum += a[2] 
LDI RC, RO ; return sum 


The compiler determines that this loop is short enough to unroll, resulting 
in a simple 3-instruction sequence and no branches. 
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5.2 TMS320 Programmer’s Interface (C/Assembly Source Debugger) 


The TMS320 Programmer’s Interface brings new levels of power and flexibility 
to embedded systems development. The interface/debugger is now available 
on virtually all TMS320 development tools, so moving to another tool or anoth- 
er generation of processor is greatly simplified. 


The debugger is an advanced software interface that runs on a PC and sup- 
ports TI’s unique, scan-based, realtime, TMS320C3x XDS emulator. The de- 
bugger provides complete control over programs written in C or assembly lan- 
guage. 


The debugger improves productivity by enabling you to debug a program in 
the language in which it is written. Programs can be debugged in C, assembly 
language, or both. The debugger also has profiling capabilities that show 
where to focus development time by quickly identifying the “hot” or time-con- 
suming sections of a program. 


Figure 5—8. The Basic Debugger Display 


ulldown —_ Load Brea Watch Memor: Color MoDe Run=F5 Step=F8 Next=F10 
vd P 
menus --DISASSLK___ 7 CALI: (CPU 1 F 
£0002d 62£00042 CALL xcall A | 2: cal1() function call 
£0002e 19840001 SUBI 1,SP 1: main() SP 00£0207c 
. £0002£ 6a00000c BU cal1+30 RO 00000001 traceback 
disassembly —+»| | 90x£0003c) R1 00£00009 
. y £00030 08510b02 LDI *_AR3 (2), IRO WATCH R2 00000007 
display £00031 02£10003 AND 3, IRO 1: str.a 0 natural-format 
£00032 08282051 LDI @02051H, ARO 2: FO 1.000000eg/R4 00000003 . 
£00033 04£10003 CMPI 3,IRO 3: color GREEN []|/R5 00000000 data displays 
£00034 51£10004 LDIHI 4,IRO R6 00000000 
£00 FILE: sample.c ate R7 00000000 
£00| 00052 } A ARO 00£00037 
£00| 00053 AR1 00000008 
£00| 90054 call (newvalue) AR2 00000000 
£00| 90055 int newvalue; AR3 00£0207c 
£00] 00056 { AR4 00000000 
C source 0 e557 static int value = 0; AR5 00000000 
displ £00] 90058 ; AR6 00000000 
sp ay —f00y 00059 switch (newvalue & 3) AR7 00000000 
00060 { IRO 00000003 
00061 case 0 : str.a = newvalue ; break; -MEMORY 
00062 case 1 : st/DISP: astr[7]———return £0207c oo£020764 
20082 Segeg2 st )| ga 5123 A £0207d 00000002 
2008s EOS e Bie Ee i £0207e 00£0002e 
00065 } c 75435 --DISP:' astr[7].£4—- | ¢9207£ 00£0207 
00066 £13 [0] 0 = 
a & ey £02080 d363ae8a 
£02081 379d0aaa scrolling data 
= oan ad ee ie £02082 £63567bb | V g 
ee £02083 9bfa3b3a displays with 
interactive —> ist 3 £02084 seeaseo| | < play 
£02085 32babab: S 
whatis str [6] 4 2 on-screen, 
command entry SF aioe) £02086 9cb5a158 | . 
d hist aa oe rel 4 £02088 8ea99a24 intensclve 
an Istor ead 9a. ua 
. y step [9] 789 £02089 8644d8a1¥ editing 
window £0208a 8ab705b5 
>>> [ll £0208b 52b9188c 


The debugger is easy to learn and use. Its window-/mouse-/menu-oriented 
interface reduces learning time and eliminates the need to memorize complex 
commands. The debugger’s customizable displays and flexible command 
entry let you develop a debugging environment that suits the system’s needs 
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(see Figure 5-8). A shortened learning curve and increased productivity re- 
duce the software development cycle, speeding products to market. 


Conditional execution and single-stepping (including single-stepping into and 
over function calls) give you complete control over program execution. A 
breakpoint can be set or cleared with a click of the mouse or by typing com- 
mands. Amemory map identifies the portions of target memory that the debug- 
ger can access and that can be defined. You can load only the symbol tables’ 
portion of an object file to work with systems that have code in ROM. The de- 
bugger can execute commands from a batch file, providing an easy method 
for entering often-used command sequences. Key features include: 


_} Multilevel debugging. The debugger allows you to debug both C and as- 
sembly language code. While debugging a C program, you can choose 
to view the C source, the disassembly of the object code created from the 
C source, or both. 


_] Fully configurable, state-of-the-art, window-oriented interface. The 
debugger separates code, data, and commands into manageable in- 
formation. You can select from several displays. Or, since the debugger’s 
display is completely configurable, you can create the interface that best 
suits the application. The display’s colors, physical appearance of dis- 
played features (such as window borders), and window size and position 
can be changed. 


[(_} Flexible command entry. Commands can be entered by using a mouse, 
the function keys, or the pull-down menus. The debugger’s command his- 
tory can be used to re-enter commands. 


[_} On-screen editing. Any data value displayed in any window can easily 
be changed by pointing (with the mouse) at the value, clicking, and enter- 
ing the correct value. 


[_] Continuous update. The debugger continuously updates information on 
the screen, highlighting changed values. 


[_] Comprehensive data display. You can easily create windows for display- 
ing and editing the values of variables, arrays, structures, pointers — any 
kind of data — in their natural format (float, int, char, enum, or pointer). En- 
tire linked lists can be displayed (see Figure 5-9). 


[_] Patch assembler. You can modify code from the debugger commandline 
without reassembling your assembly source. 
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Figure 5—9. Debugger’s Data Display 


-DISP: str ———_ WATCH > 
a 123 A ale Ghesscey [0] 
iO y 2: FO 1.000000e 
¢ 75435/-DISP: *str.£3 ——\ 3: color GREEN 
gail, Sh a 8327 A 
£2 6 b 666 y 
£3 0x00f| c 87213,-DISP: *str.f£3->f£3 
7) [f.eci||| sal 25 a A 
Pi) bees ele) 
£3 O0x00f| c 782 | 
£4 sei||| eb 
£2 9 v 
£3 0x00f000a 
S4 pi ocd “4 


(_} Powerful command set. The TMS320 debugger supports a small but 
powerful command set that makes full use of C expressions. One debug- 
ger command performs actions that might require several commands in 
another system. 


_} Compatibility. The TMS320C31 C source debugger runs on IBM PC/ATs 
and compatible PCs. For the simulator, the debugger is available on Sun 
workstations. 


(_] Profiler. The C source debugger has an option for profiling software. 
When you are deciding whether to convert portions of a program from C 
to assembly, it is helpful to know which functions take the most time. A pro- 
filer that measures the amount of execution time in different functions or 
portions of a program is very helpful. The profiler is easy to use and pro- 
vides a number of features, including 


M@ Elegant user interface. The TI code profiler shares the same fully 
configurable, window-oriented, and mouse-driven interface as the TI 
C source debugger, so learning to profile is quick and easy. 


H Multilevel profiling. An assembly window and a C window are dis- 
played, so you can profile C code, assembly code, or both simulta- 
neously. 


@ Powerful command set. A rich set of commands is available to select 
and manipulate profile areas on the global, module, function, and ex- 
plicit levels, so you can efficiently profile even the most complex ap- 
plications. 
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H@ Comprehensive statistics. The profiler provides all the information 
you need to identify bottlenecks in your code: 


= The number of times each area was entered during the profile 
session. 

= The total execution time of an area, including or excluding the 
execution time of any subroutines called from within that area. 

= The maximum time for one iteration of an area, including or ex- 
cluding the execution time of any subroutines called from within 
that area. 


HH Versatile display. The ability to choose profile areas, the type of sta- 
tistical data, and sorting criteria ensures an efficient, customized dis- 
play of the statistics. The data can also be accompanied by histo- 
grams to show the statistical relationship between profile areas. 


M@ Disabled areas. You can disable portions of a profile area to prevent 
them from adding to the statistics. This is convenient for removing the 
timing impact of standard library functions or a fully optimized portion 
of code. 


H@ Simplicity. The profilerss simple setup, default configurations, 
“canned” commands, and inherent flexibility facilitate sophisticated 
profiling within a short time. 
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5.3 TMS320C31 Assembly Language Tools 


The TMS320C31 assembly language tools are code-generation tools that 
convert assembly language source files into executable object code. Key fea- 
tures include: 


= 


Cou vou wo 


Macro capabilities and library functions 
Conditional assembly 

Relocatable modules 

Complete error diagnostics 


Symbol table and cross-references 


The assembler translates assembly language source files into machine lan- 
guage object files. Source files can contain instructions, assembler directives, 
and macro directives. Assembler directives control various aspects of the as- 
sembly process such as the source-listing format, symbol definition, and the 
way the source code is placed into sections. The assembler has the following 
features: 


LJ 


I] 


L} 
L] 
L] 


LJ 


Lu 


Processes the source statements in a text file to produce a relocatable ob- 
ject file 


Produces a source listing (if requested) and provides control over this list- 
ing 
Appends a cross-reference listing to the source listing (if requested) 


Allows segmentation of user’s code 


Maintains an SPC (section program counter) for each section of object 
code 


Defines and references global symbols 
Assembles conditional blocks 


Supports macros, allowing the user to define macros either in-line with or 
within a macro library 
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The linker combines object files into a single executable object module. As it 
creates the executable module, it performs relocation operations and resolves 
external references. The linker accepts COFF (common object file format) ob- 
ject files (created by the assembler) as its input. It can also accept archive li- 
brary members and modules created by a previous linker run. Linker directives 
allow you to combine object file sections, bind sections and symbols to specific 
addresses, and define/redefine global symbols. The linker has these features: 


Defines a memory model that conforms to the target system’s memory 
Combines object file sections 

Allocates sections into specific areas within the target system’s memory 
Defines or redefines global symbols to specific values 

Relocates sections to final addresses 


Resolves undefined external references between the input files 


CuUUWCUOUOODU 


Allows separate load-time and runtime addresses for sections of code 


The archiver makes it possible to collect a group of files into a single archive 
file. For example, several macros can be collected together into a macro li- 
brary. The assembler will search through the library and use the members that 
are called as macros by the source file. Also, it is possible to use the archiver 
to collect a group of object files into an object library. The linker will include the 
members in the library that resolve external references during the link. 


Most EPROM programmers do not accept COFF object files as their input. The 
ROMS30 object format converter must be utilized to convert the COFF object 
file into Intel, Tektronix, or Tl-tagged hex object format. ROM30 is part of the 
assembler, linker, and archiver package. The converted file can then be down- 
loaded into the EPROM programmer. 
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5.4 TMS320C3x Software Simulator 


A simulator is a software program that simulates the TMS320C3x micropro- 
cessor and microcomputer modes for cost-effective software development 
and program verification in non-realtime. With the inexpensive software simu- 
lator, you can debug without target hardware. Files can be associated with I/O 
ports so that specific I/O values can be used during test and debug. Time-criti- 
cal code, as well as individual portions of the program, can be tested. The 
clock’s counter allows loop timing during code optimization. Breakpoints can 
be established according to read/write executions (using either program or 
data memory) or instruction acquisitions. The simulator uses the standard 
C/assembly source debugger interface (described in Section 5.1), allowing the 
user to debug code in C, assembly, or both. 


Key features of the TMS320C3x software simulator include: 
[__} Execution of user-oriented DSP programs on a host computer 


[_] Inspection and modification of registers 


[_} Data and program memory modification and display: 
H Modification of an entire block at any time 


Hi Initialization of memory before a program is loaded 


[_| Simulation of peripherals, caches, and pipelined timings 


Lu 


Extraction of instruction cycle timing for “device performance” analysis 


[_.] Programmable breakpoints on: 
HM Instruction acquisition 
HM Memory reads and writes (data or program) 
H@ Data patterns on the data bus or the program bus 
@ Error conditions 
(_] Trace on: 
M@ Accumulator 
HM Program counter 


@ Auxiliary registers 


[__] Single-stepping of instructions 


[_] Interrupt generation at user-specified intervals 
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L 


_} 
L} 
L} 


Error messages for: 
Hi lllegal opcodes 


Hi Invalid data entries 
Execution of commands from a journal file 
A branch to “self” is detected 


Execution is halted 


Once program execution is suspended, the internal registers and both pro- 
gram and data memories can be inspected and/or modified. The trace memory 
can also be displayed. A record of the simulation session can be maintained 
in a journal file so that it can be re-executed to regain the same machine state 
during another simulation session. 


L] 
LL} 


Simulation of the TMS320C31’s entire instruction set 
Simulation of the TMS320C31 peripheral’s key features 


Command entry from either menu-driven keystrokes (menu mode) or line 
mode 


Help menus for all screen-displayed modes 
Interface that can be user-customized 


Simulation parameters quickly stored/retrieved from files to facilitate prep- 
aration for individual sessions 


Reverse assembly for editing and reassembling source statements 


Memory that can be displayed (at the same time) as 
M Hexadecimal 32-bit values 


mM Assembled source 


TMS320C3x Software Simulator 


_J Execution modes 

H  Single/multiple instruction count 
Single/multiple cycle count 
Until condition is met 


While condition exists 


For set loop count 

M@ Unrestricted run with halt by keyed input 
_] Trace execution with display choices 

@ Designated expression values 

HM Cache memory 

@ Instruction pipeline 


__] Simulation of cache utilization 


[_] Cycle counting 


HM Display of the number of clock cycles in a single-step operation or in 
the run mode 


@ Externally generated mode that can be configured with wait states for 
accurate cycle counting 


The simulator lets you verify and monitor the state of the processor. Simulation 
speed can be either thousands of instructions per second (VAX VMS and 
SUN-3 UNIX) or hundreds of instructions per second (PC-DOS/MS-DOS). 


The TMS320C31 simulator is available for the IBM PC-DOS/MS-DOS 
(5.25-inch floppy), the VAX/VMS (in backup format on 1600-bpi magnetic 
tape), and the SUN-3/4 UNIX (in TAR format on 1600-bpi magnetic tape) oper- 
ating systems. The PC configuration requires a minimum of 512K bytes for 
the TMS320C31 simulator. 
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5.5 TMS320C3x Evaluation Module 


The TMS320C3x evaluation module (EVM) is a low-cost development board 
used for device evaluation, benchmarking, and limited system debug. The 
TMS320C3x EVM (see Figure 5-10) eliminates the cost barrier to evaluating 
and developing embedded systems based on the TMS320C31. 


Features include: 
[_} Assembler 
[_] On-board memory 


[_.] Host upload/download capabilities 


[__] 1/0 capability 


Figure 5-10. TMS320C3x EVM 
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The TMS320C3x EVM enables you to benchmark and evaluate code in real- 
time while the device is operating at 30 MHz in the rich development environ- 
ment of the TMS320C3x assembler/linker and C/assembly source debugger 
interface. Applications can be benchmarked and tested easily with the analog- 
ready interface. 


TMS320C3x Evaluation Module 


The TMS320C3x EVM comes complete with a PC half-card and software 
package. The EVM board contains: 


L} 
L} 


One TMS320C30 — a 33-MFLOP, 32-bit processor. TMS320C31 applica- 
tions can be developed by using only those ’C30 features available on a 
C31 


16K-word, zero wait-state SRAM, allowing coding of most algorithms di- 
rectly on the board 


Analog interface for embedded systems development 


An external serial-port interface that can be used for connecting multiple 
EVMs or for extra analog interfacing 


A host port for PC communications 


Embedded emulation support via the 74ACT8990 test bus controller 


The system also comes with all of the software required to begin application 
development on a PC host: 


I 


The window-oriented, mouse-driven interface supports downloading, 
executing, and debugging of assembly code or C code, including modifi- 
cation/display of memory and registers, software single-step, and break- 
point capabilities. 


The TMS320C3x assembler/linker is also included with the EVM. For 
high-level language programming, the optimizing ANSI C and the Ada 
compilers are offered separately. 


The TMS320C3x EVM is supported on PC-AT/MS-DOS (version 3.00 or high- 
er) platforms. 
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5.6 TMS320C3x Emulator 
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The TMS320 Extended Development Systems (XDSs) are powerful, full- 
speed emulators used for system-level integration and debug. Tl developed 
the world’s first in-system scan-based emulator (XDS) for TMS320C3x pro- 
cessors. 


Scan-based emulation is a unique, nonintrusive approach to system emula- 
tion, integration, and debug. This approach was conceived and developed by 
Tl to address hardware/software characteristics (reduced internal bus visibili- 
ty, highly pipelined architectures, faster cycle times, higher-density packaging) 
that are inherent to sophisticated VLSI systems. 


Scan-based emulation eliminates special “bond-out” emulation devices, tar- 
get cable/buffer signal degradation, and the mechanical and reliability prob- 
lems associated with target connectors and surface-mount packaging. With 
scan-based emulation, your program can execute in realtime from internal or 
external target memory — no extra wait states are introduced by the emulator 
at any clock speed. 


The TMS320C31’s architecture implements scan-based emulation through in- 
ternal, shift-register, scan chains accessed by a single serial interface. The 
scan chains provide access to internal device registers and state machines, 
allowing complete visibility and control. This nonintrusive approach even oper- 
ates in a production environment where the DSP is soldered into a target sys- 
tem. 


Since program execution takes place on the TMS320C31 in the target system, 
there are no timing differences during emulation. This new design offers signif- 
icant advantages over traditional emulators. These advantages include: 


_} No cable length transmission line problems 
Nonintrusive system 

No loading problems on signals 

No artificial memory limitations 

TMS320C3x C/assembly source debugger interface 
Easy installation 


In-system emulation 


Coo UoUUoU oOo 


No variance from device's data sheet specifications 


TMS320C3x Emulator 


The TMS320C3x XDS emulator (see Figure 5-11) is a user-friendly, PC- 
based development system that supports hardware development on the 
TMS320C30 and TMS320C31. This emulator provides a means for develop- 
ing the software and hardware within a target system. Access is provided to 
every memory location and register of the TMS320C3x through the use of a 
revolutionary scan path interface. The TMS320C3x XDS emulator board inter- 
prets commands and converts these commands into the appropriate signal 
sequences necessary to control the TMS320C3x in your target system. Key 
features of the TMS320C3x XDS emulator include: 


LJ 


LJ 
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Full-speed execution and monitoring of the TMS320C3x in your target 
system via a 12-pin target connector 


TMS320 C/assembly source debugging (PC/MS-DOS) via Tl’s standard 
windowed Programmer’s Interface (see Section 5.2) 


200 software breakpoints 

Software trace/timing 

Single-step execution 

Loading/inspecting/modification of all registers 
Uploading/downloading of program memory and data memory 


Benchmarking of execution time of clock cycles 
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Figure 5-11. TMS320C3x XDS Emulator 


5-28 


Insert 
Negative 


F 


Get this photo from the TMS320 Family Development Support Reference 
Guide (job # 61136); page 5—28; Figure 5-10. 


Software breakpoints allow program execution to be halted at a specified 
instruction address. When a given breakpoint is reached, the program halts 
execution. At this point, the status of the registers and of the CPU is available. 
Their contents are visible in the appropriate windows; to view the contents of 
other memory locations, only one command is required. 


Software trace lets you view the state of the TMS320C3x when a breakpoint 
is reached. This information can be saved in a file for future analysis. Software 
timing allows you to track the clock cycles between breakpoints for bench- 
marking of time-critical code. 


Single-step execution gives you the capability to step through the program, 
one instruction at a time. After each instruction, the status of the registers and 
CPU are displayed. This provides greater flexibility during software debug and 
helps reduce the development time. 


Object code can be downloaded to any valid TMS320C3x memory location 
(program or data) via the scan path interface. Downloading a 1K-byte object 
program typically takes 100 ms. In addition, by inspecting and modifying the 
registers while single-stepping through a program, you can examine and 
modify program code or parameters. 


The emulator’s configurability gives your system flexibility. You can configure 
both memory and screen color. The address range, memory type, and access 


TMS320C3x Emulator 


type assigned to each location can also be configured. The memory map, 
which may include EPROM, SRAM, DRAM, and on-chip memory and periph- 
erals, can be configured to reflect the actual peripheral environment of the tar- 
get system, including wait states and access privileges. 


TMS320C3x XDS System Requirements 


Host IBM PC-AT 

Slot One and one-half 16-bit slots 
Memory Minimum of 640K words 

Storage One floppy drive and one hard drive 
Operating System PC/MS-DOS 2.0 or later version 
Power Supply Minimum; approximately 3 amps 


@ 5 volts (150 watts) 
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5.7 TMS320C3x Application Board With Software Demo 
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Key features of the TMS320C3x application board are: 
| 16Kx32-bit, zero wait-state, full-speed SRAM on the primary bus 


|_| Two selectable banks of 8Kx32-bit, zero wait-state, full-speed SRAM on 
the expansion bus 


_J TMS320C30 DSP 


J 512Kx32-bit DRAM (user-upgradable to 1Mx32-bits) 


The large amount of on-board SRAM affords realtime emulation and memory 
storage flexibility for a variety of algorithms. The on-board SRAM provides 
zero wait-state access to memory allowing read/write in realtime. 


Three types of DRAM cycles are used on the TMS320C3x application board: 
Single-word read, single-word write, and page-mode read. These operations 
require four, two, and one wait state per access, respectively. Note that when 
you invoke page mode read while accessing the emulator’s DRAM, fewer wait 
states are required. Page-mode DRAM is often used to improve “bulk storage” 
performance. Page-mode read cycles are automatically invoked when the 
TMS320C3x performs two or more back-to-back read cycles on the same 
memory page; one page of memory holds 256 words — the default memory 
bank size for the TMS320C3x. 


SPOX Operating System software is also available for the application board. 


HP 64776 Analysis Subsystem 


5.8 HP 64776 Analysis Subsystem 


Tl and Hewlett-Packard jointly designed and developed the HP 64776 Analy- 
sis Subsystem, an emulator/analyzer for the TMS320C3x (see Figure 5-12). 
(For TMS320C31 analysis, an adapter is available from HP to use the subsys- 
tem with a surface-mounted TMS320C31.) The HP 64776 combines with the 
TI TMS320C3x XDS emulator to yield a complete tool set for integrating hard- 
ware with software, producing an extremely powerful debug environment. 
HP’s active probe technology yields the maximum electrical and mechanical 
transparencies, improved signal quality, and realtime control and debug of the 
target system at full operating speed. 


The complete analysis subsystem integrates the HP 64776, the TMS320C3x 
XDS, and the C source debugger (described in Section 5.2) in a stand-alone 
PC environment. The Tl debugger acts as the user interface, and communica- 
tions between the subsystem and the PC are handled through an RS-232C 
connector. This powerful system provides software and hardware breakpoint 
and trace, as well as sophisticated bus-cycle analysis. 


Figure 5-12. HP 64776 Analysis Subsystem 
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Key features of the subsystem include: 


(_] 64 analysis channels that can trace the TMS320C3x’s primary or expan- 
sion bus as well as status information. Nonintrusive analysis lets you view 
the processor’s bus cycles in realtime. Analysis can be performed on the 
following signals: 


» AO -A23 = XA0 — XA12 = INTO —INT3 
(primary-bus address) (expansion-bus address) 

= DO-D31 = XDO — XD32 =» TCLKO 
(primary-bus data) (expansion-bus data) 

=» STRB » MSTRB =» TCLK1 

«= R/W =» IOSTRB = XFO 

» HOLDA = IACK = XF1 


[_] Trace specifications that can be set up easily, using address, data, and 
status-event comparators. A range comparator can also be used to qualify 
addresses or data. 


(_] Hardware breakpoint capabilities that enable you to detect a specified 
event and stop the processor. Once the processor is stopped, the debug 
capabilities of the TMS320C3x XDS facilitate isolation of target’s hard- 
ware/software problems. 


(_} The ability to drive triggered signals to and receive them from other instru- 
ments such as logic analyzers and oscilloscopes, allowing synchronized 
measurements between tools. 


The HP 64776 operates on PC/AT platforms utilizing DOS (version 3.0 or 
higher). 
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5.9 TMS320 Technical Support 


5.9.1 Technical Documentation 


A wide variety of technical literature is available to assist you through the de- 
sign cycle. These documents include product and preview bulletins, data 
sheets, user’s and reference guides, over 2000 pages of application notes, 
and textbooks offered by Prentice-Hall, John Wiley and Sons, and Computer 
Science Press. To inquire about available TMS320 literature, call the Custom- 
er Response Center (CRC): 


(214) 995-6611 


The following list describes the general contents of each major category of 
technical documentation available through the Customer Response Center: 


[_} Product and preview bulletins and product briefs give an overview of the 
devices and development support within the TMS320 family, presenting 
capabilities, diagrams, and hardware/software applications. 


(_} User’s guides for TMS320 processors provide detailed information re- 
garding the architecture of the device, its operation, assembly language 
instructions, and hardware and software applications. 


[_.] Data sheets include electrical specifications, timing characteristics, and 
mechanical data for a device. 


[|_| Application books/reports describe theory andimplementation of selected 
TMS320 applications, including algorithms, code, and block/schematic/ 
logic diagrams. Currently, there are over 2000 pages of application reports 
to support the TMS320 family. 


[_} Technology brochures provide an overview of various implementations of 
DSP technology. 
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5.9.2 Details on Signal Processing Newsletter 


The TMS320 newsletter, Details on Signal Processing, is published quarterly 
to update TMS320 customers on product information and industry trends. It 
covers TMS320 products, documentation, third-party support, application 
boards, mini-application reports, development tool updates, contacts for sup- 
port, design workshops, seminars, conferences, and the TMS320 university 
program. 


To be added to the mailing list, call the Customer Response Center: 


(214) 995-6611 


5.9.3. TMS320 Bulletin Board Service 


The TMS320 Bulletin Board Service (BBS) is a telephone-line computer bulle- 
tin board that provides access to information about the TMS320 family. The 
BBS is an excellent means of communicating specification updates for current 
or new TMS320 application reports as they become available. It also serves 
as a means to trade programs with other TMS320 users. 


The BBS contains TMS320 source code from the more than 2000 pages of 
application reports written to date. These programs include macro definitions, 
FFT algorithms, filter programs, ADPCM algorithms, echo cancellation, graph- 
ics, control, companding routines, and sine-wave generators. 


You can access BBS with a terminal or PC and a modem. The modem must 
be able to communicate at a data rate of either 300, 1200, 2400, or 9600 bps. 
A character length of eight bits is required, with one stop bit and no parity. The 
telephone number of the bulletin board is (713) 274-2323. There is a 90-min- 
ute access limit per day on the bulletin board. The BBS is open 24 hours a day. 
ROM-code algorithms may be submitted by secure electronic transfer via the 
TMS320 BBS. 


5.9.4 TMS320 DSP Technical Hotline 
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The TMS320 group at Texas Instruments maintains a DSP Hotline to answer 
TMS320 technical questions. Specific questions regarding TMS320 device 
problems, development tools, third-party support, consultants, documenta- 
tion, upgrades, and new products are answered. 


The TMS320 DSP Technical Hotline is open five days a week from 8:00 AM 
to 6:00 PM Central Time. It is staffed with engineers ready to provide the sup- 
port needed for your TMS320 design or evaluation. 


To assure the maximum support from this service, first consult your product 
documentation. If your question is not answered there, gather all of the infor- 
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mation that applies to your problem. With your information, manuals, and prod- 
ucts close at hand, call: 


TMS320 DSP Technical Hotline (713) 274-2320 


For realtime transmission of information, a facsimile machine is available: 


FAX (713) 274-2324 
or you may submit information via electronic mail: 


The Hotline Internet address is 


4389750@mcimail.com 


The MCI mail address is 
4389750 or TMS320 Hotline 


Questions on pricing, delivery, and availability should be directed to the near- 
est TI Field Sales Office. 


5.9.5 TMS320 Application Software 


To simplify development of applications, TI and its third parties offer a wide va- 
riety of software that can be licensed. This software covers a range of DSP 
functionality that includes vocoders, speech recognition, modems, audio cod- 
ers, and image coders. The software available for license can provide a head- 
startin the development of your final application. In addition, software applica- 
tions that have been published in Tl DSP user’s guides and application books 
are available via the BBS. 


Contact the DSP Hotline for a list of software available for the TMS320C31. 


5.9.6 Design Workshops 


Texas Instruments offers a wide array of up-to-date technical product semi- 
nars and design workshops through its Technical Training Organization (TTO) 
to assist designers in developing the skills needed to implement their ideas 
quickly, produce a quality product, and shorten time to market. Applications as- 
sistance is also offered through local Regional Technology Centers (RTCs). 


The DSP design workshops give design engineers hands-on experience us- 
ing the latest TMS320 products, development tools, and design techniques. 
These workshops go beyond the standard lecture format. The exercises and 
lab experiments start with the basics and move quickly into hands-on exer- 
cises. In these workshops, the student learns by doing, not just listening or ob- 
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serving. The workshops are designed to help customers shorten the design 
cycle, control development costs, and solve design challenges. 


Further information on courses and schedules in North America can be ob- 
tained by contacting the TTO Central Registration office at (800) 336-5236, 
ext. 3904. 


5.9.6.1 TMS320C3x Design Workshop 


The TMS320C3x DSP design workshop introduces design engineers to the 
powerful TMS320C3x generation of DSPs. Hands-on, EVM-based exercises 
throughout the course give the designer a rapid start in utilizing TMS320C3x 
design skills. Experience with digital design techniques is desirable. Assembly 
language experience is required. C language programming experience is de- 
sirable. 


Topics covered in the TMS320C3x DSP design workshop include: 
TMS320C38x architecture/instruction set 

Use of the PC-based TMS320C3x EVM 

Floating-point and parallel operations 

Use of the TMS320C3x assembler/linker 

C programming environment 


System architecture considerations 


DCoUUODUUE 


Memory and I/O interfacing 


__]} TMS320C3x development support 


5.9.6.2 Digital Control Design Workshop 
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The digital control design workshop covers all the fundamental issues involved 
in the design and implementation of physical control systems using TMS320 
DSPs. The workshop is divided into two major parts. The first part covers 
theory and design of control systems and discusses practical aspects that a 
control design engineer should be aware of before attempting to implement a 
controller. The second part is devoted to hands-on experience with 
TMS320C25 DSPs to demonstrate and practice control implementation ex- 
amples. A design and implementation software package is used to test algo- 
rithms on an actual motor positioning system. 


Topics covered in the digital control design workshop include: 


__] System modeling 


[_] Stability analysis 


TMS320 Technical Support 


Analysis of numerical problems 
Quantization effects 


Truncation, rounding, and scaling issues 


Eo) ML), Ld 


Sampling rate selection 


[_} Algorithm structural optimization 


5.9.6.3 Applications in C Design Workshop 


The Applications in C design workshop is an advanced, C programming 
course, which is tailored for practical, hands-on applications using Turbo C 
and the Tl TMS320C3x C compiler. This course is for hardware and software 
engineers with a background in programming and an introductory knowledge 
of C. The course centers around data structure concepts illustrated with appli- 
cation examples. Program examples include file filters, sorting, Huffman cod- 
ing for data compression, memory management, graphics algorithms, and 
other utilities. 


Topics covered in the Applications in C design workshop include: 
Review of C language (syntax and conventions) 

Data structures, constructs, and concepts 

Optimization and efficiency techniques 


Arrays and pointers 


DUOC ovo 


Portability issues 


__} Algorithms (FFT, discrete transforms, bit manipulation, etc.) 


5.9.7 Design Services 


The TI technical staff can offer applications assistance with customer designs 
through local Regional Technology Centers. Services include: 


[_] Design assistance 


[__} Simulation 


[J Emulation 


Each Regional Technology Center uses up-to-date development systems, in- 
cluding workstations and personal computers, plus demonstration, test, and 
evaluation equipment. Tl staff designers use fully equipped laboratories to 
provide efficient design assistance. 
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The first step to a successful design is an explanation of the project’s parame- 
ter: production requirements, design function(s), and price. The results of 
these discussions will allow Tl and a customer to explore: 


[__] Design/cost trade-offs 


[_} Product implementation options 


Once the various trade-offs/options are selected and approved, Texas Instru- 
ments can provide further assistance in the design of a customer’s product, 
sharing a mutual goal of bringing a successful product to market as quickly as 
possible. 


5.9.8 RTC Locations 


RTC Locations 


The following list gives the worldwide locations of the Tl Regional Technology 


Centers. 


RTC Worldwide Locations 


North American Locations 


ATLANTA 

Texas Instruments 
5515 Spalding Drive 
Norcross, GA 30092 
(404) 662-7950 


BOSTON 

Texas Instruments 

950 Winter Street, Suite 2800 
Waltham, MA 02154-1263 
(617) 895-9196 


CHICAGO 

Texas Instruments 

515 W. Algonquin Road 
Arlington Heights, IL 60005 
(708) 640-2909 


DALLAS 

Texas Instruments 

7839 Churchill Way 
Park Central V, MS 3984 
Dallas, TX 75251 

(214) 917-3881 


INDIANAPOLIS 

Texas Instruments 

550 Congressional Blvd., Suite 100 
Carmel, IN 46032 

(317) 573-6400 


NORTHERN CALIFORNIA 
Texas Instruments 

5353 Betsy Ross Drive 
Santa Clara, CA 95054 
(708) 748-2220 


SOUTHERN CALIFORNIA 
Texas Instruments 

1920 Main St., Suite 900 
Irvine, CA 92714 

(714) 660-8140 


OTTAWA 

Texas Instruments Canada, Ltd 
301 Moodie Drive, Suite 102 
Nepean, Ontario 

Canada, K2H 9C4 

(613) 726-1970 


MEXICO CITY 

Texas Instruments de Mexico 
Alfonso Reyes 115 

Col. Hipodromo Condesa 
Mexico, D.F., Mexico 06170 
(52) (5) 515-6081 

(52) (5) 515-6249 


International Locations 


AUSTRALIA 

Texas Instruments Australia Ltd. 
6-10 Talavera Road, North Ryde 
New South Wales, Australia 2113 
Tel: (61) (2) 8789000 


JAPAN (Tokyo) 

Texas Instruments Japan Ltd 
Ms Shibaura Building 9F 
4—13-23 Shibaura 

Minato-Ku, Tokyo, JAPAN 108 
Tel: (81) (3) 3769-8700 
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Table 5-1. RTC Worldwide Locations (Concluded) 


International Locations 


BRAZIL JAPAN (Osaka) 
Texas Instruments Electronicos do Brasil Ltda Texas Instruments Asia LTD 
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Av. Eng. Luiz Carlos Berrini 
1461-110. andar 

04571 Sao Paulo, SP, Brazil 
Tel: (655) (11) 535-5133 


FEDERAL REPUBLIC OF GERMANY 
Texas Instruments 

Deutschland GMBH 

Kirchhorster Strasse 2 

3000 Hannover 51, FR Germany 

Tel: (49) (511) 648021 


FEDERAL REPUBLIC OF GERMANY 
Texas Instruments 

Deutschland GMBH 

Haggertystrasse 1 

8050 Freising, FR Germany 

Tel: (49) (8161) 80-0 


FRANCE (Paris) 

Texas Instruments France 

8-10 Avenue Morane Saulnier 
Borte Postale 67 

Velizy Villcoublay Cedex, France 
Tel: (33) (13) 0701001 


HONG KONG 

Texas Instruments Hong Kong Ltd. 
8th Floor, World Shipping Centre 
7 Canton Road 

Kowloon, Hong Kong 

Tel: (852) 7351223 


ITALY (Milan) 

Texas Instruments Italia S.P.A. 
Centro Direzionale Colleoni 
Palazzo Perseo 

Via Paracelso, North 12 

20041 Agrate Brianza, MI, Italy 
Tel: (39) (39) 63221 


Osaka Branch 

Nissho-lwai Bldg 5F 
2-5-8 Imabashi Chuou-Ku 
Osaka, Japan 541 

Tel: (81) (6) 204-1881 


KOREA 

Texas Instruments Korea Ltd. 
28th Floor, Trade Tower 

159 Samsung-Dong 
Kangnam-ku, Seoul 

Trade Center P.O. Box 45 
Seoul, Korea 135-729 

Tel: (82) (2) 5512800 


SINGAPORE 

Texas Instruments Singapore (Pte) Ltd. 
Asia Pacific Division 

101 Thomson Road #23-01 

United Square 

Singapore 1130 

Tel: (65) 2519818 


SWEDEN 

Texas Instruments International Trade 
Corporation 

Box 30 

S—-164 93 Kista 

Isafjordsgatan 7, Sweden 

Tel: (8) 752-5800 


TAIWAN 

Texas Instruments Taiwan Ltd. 
Taipei Branch 

10 Floor, Bank Tower 

205 Tung Hua N. Road 
Taipei, Taiwan 105 

Republic of China 

Tel: (886) (2) 7139311 


UNITED KINGDOM 

Texas Instruments Ltd. 
Regional Technology Center 
Manton Lane 

Bedford, England MK41 7PA 
Tel: (44) (234) 270111 
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TMS320C31 Third-Party Support 


This chapter lists third-party manufacturers and suppliers alphabetically by 
name and describes their current C31 products. 


The third parties discussed in this chapter include: 


Topic Page 
6.1 Accelerated Technology, INC. ......... 0. cee eee eee eee eee 6-2 
6.2 A. T. Barrett & Associates, INC. ..... 0... cece eee 6-5 
6:3 (Blomatlomy esevercrtc cceyetrietereserjeleievererneyerelclalniecseralaie mec elelsiain’elalevaia'e)s eis 6-9 
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CHS Ste tdoial(s ellis (Cilla) sossosece sence rdsenensereoseee cease 6-14 
6.7 Integrated Motion, Incorporated ..........0. cc cece eee eee eee 6-15 
6.8 Loughborough Sound Images Lid. ..............02eeeeeeee eee 6-17 
6.9 Precise Software Technologies Inc. ...........00eeeee seen eee 6-19 
6.10 Spectron Microsystems Inc. ........... cee eee eee eee eee 6-23 
6.11 Spectrum Signal Processing Inc. ..........:0cee cece eee eee eee 6-31 
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614 Winthiss:  caesseesseee swore unenies eee sees eee neieaieme ies 6-40 
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Accelerated Technology, Inc. 


6.1 Accelerated Technology, Inc. 


6-2 


P. O. Box 850245 
Mobile, AL 36685 
(800) 468—NUKE 
(205) 661-5770 


J] Nucleus RTX 


Nucleus RTX is a multitasking executive specifically designed for realtime 
embedded applications using the TMS320C3x microprocessors. Nucleus 
provides applications with advanced realtime facilities that encompass 
management of task execution, task communication and synchronization, 
system resources, predefined memory partitions, and dynamic-length 
memory. 


Nucleus RTX facilities are designed to operate in a consistent, reliable, 
and efficient manner. Each task executing under Nucleus has a priority. 
When multiple tasks are ready to execute, the task with the highest priority 
is executed first. Tasks of the same priority execute in a first-in-first-out 
(FIFO) manner. In addition to the many standard realtime facilities, 
Nucleus also provides facilities such as task priority modification, task time 
slicing, item sizes for communication queues defined by the user, Suspen- 
sion of full queues, Suspension on multiple empty queues, both types of 
memory management, suspension on unavailable memory, and event 
flag consumption. Additionally, any Nucleus task suspension can be given 
a maximum amount of time to stay suspended. 


Software Products 


Accelerated Technology offers other realtime software products for use 
with the TMS320C3x generation. These include a multitasking debugger, 
a reentrant C library, an MS/DOS-compatible file system, and in the near 
future networking support in the form of TCP/IP protocols. 


The Nucleus debugger provides access to all Nucleus structures in a user- 
readable fashion. Control structures for tasks, queues, semaphores, 
event flags, and memory management are all available for inspection. Ad- 
ditionally, the Nucleus debugger allows you to dynamically execute most 
of the Nucleus RTX service calls. 


The reentrant C libraries supplied by Accelerated Technology provide 
standard ANSII interfaces for all functions, with the exception of file ser- 
vices (file services are provided by the Nucleus file system). Because the 
library routines are fully reentrant, application tasks running under 
Nucleus can use them. 
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Nucleus File is an MS/DOS-compatible file system that is capable of reading 
and writing standard floppy- and hard-disk formats. Nucleus File is specifically 
designed for embedded applications. 


Accelerated Technology’s realtime software products are primarily written in 
ANSIIC and are optimized for performance on the TMS320C3x DSPs. All soft- 
ware products are delivered with complete source code and without any royal- 


ties. 


(_] Features of the Nucleus RTX Realtime Multitasking Executive 


Realtime, multitasking, executive for the TMS320C3x DSPs 
Complete source code 

No royalties 

Priority base with optional preemption and time slicing 
Task communication with user-defined public queues 
Item size of each queue defined by user 

Optional task suspension on full queues 

Optional task suspension on multiple queues 

Task synchronization with event flags 

Optional consumption of event flags 

Resource management with semaphores 

Predictable fixed-length memory management 
Flexible variable-length memory management 
Optional task suspension when memory is unavailable 
Optional timeout for any task suspension 

System history log 

Task performance analysis facilities 

Task-oriented debugger 

MS/DOS-compatible floppy file system 

TCP/IP network support (Q4 92) 


(_] Technical Support 


Structured and documented source code 
Detailed programmer’s reference manual 
Detailed internal design manual 

Telephone consultation 

Warranty and maintenance service 
Extensive counseling and contract services 


(_] Shipping Media—MS/DOS 5-1/4-inch diskette 
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Figure 6—1. Realtime Application Tasks 
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NU_Start() NU_Reset_Performance_Timer() 
NU_Change_Priority() NU_Retrieve_Next_History_Entry() 
NU_Change_Time_Slice() NU_Retrieve_Performance_Info() 
NU_Control_Interruptor() NU_Start_History_Saving() 
NU_Enable_Preemption() NU_Start_Performance_Timer() 
NU_Disable_Preemption() NU_Stop_History_Saving() 
NU_Relinquish() NU_Stop_Performance_Timer() 
NU_Sleep() 

NU_Stop() 

NU_Reset() 

NU_Retrieve_Task3 () 

NU_Current_Task_ID() 


Clock Management Fixed-Size Memory Management 


NU_Set_Time() NU_Alloc_Partition() 
NU_Read_Timer() NU_Available_Partitions() 
NU_Dealloc_Partition() 


Management Variable-Size Memory Management 


NU_Send_Item() NU_Alloc_Memory() 
NU_Force_Item_In_Front() NU_Available_Memory() 
NU_Retrieve_lItem() NU_Dealloc_Memory() 
NU_Retrieve_Item_Mult() 

NU_Retrieve_Queue_Status() 


Management Resource Management 


NU_Set_Events() NU_Request_Resource() 
NU_Wait_For_Events() NU_Retrieve_Resource_Status() 
NU_Release_Resource() 
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6.2 A.T. Barrett & Associates, Inc. 
11501 Chimney Rock 
Houston, Texas 77035 
(800) 525-4302 
(713) 728-9688 
FAX: (713) 728-1049 


[|_| RTXC, Realtime Kernel for single processor systems 
[|_|] RTXC/MP, Realtime Kernel for multiple processor systems 


RTXC and RTXC/MP are fully preemptive, priority-driven, realtime kernels 
written in ANSI C that enable you to tap the full power of the TMS320C3x 
processors in realtime environments. Released in 1985, RTXC has been 
continuously upgraded. 


Demonstration and benchmark disks on RTXC and RTXC/MP are avail- 
able free of charge. An evaluation package containing a full kernel, a spe- 
cial user’s manual, and special utilities to assist in evaluation of the kernel 
is also available. The package gives you the complete picture of the capa- 
bilities, performance, scalability and ease of use of these realtime kernels. 


RTXC and RTXC/MP are available for a one-time site license fee. All con- 
figurations of processor and compiler bindings include full source code 
and require no runtime royalties. Most compilers are supported. 


The combination of RTXC and RTXC/MP address a broad range of ap- 
plications. RTXC is aimed at embedded applications, which would typical- 
ly use a single TMS320C2x, ’C3x, or C5x DSP. RTXC/MP is targeted at 
applications employing multiple TMS320C3x or ’C4x processors. 


RTXC and RTXC/MP share many of the same attributes and components. 
Most importantly, both kernels use a similar application program interface 
(API). However, RTXC/MP extends the RTXC API to include those func- 
tions which are necessary for the special requirements of the multiproces- 
sing environment. The API provides a wide range of kernel services such 
as task management, timer management (including timeouts), intertask 
communication and synchronization, memory and resource manage- 
ment, and processor-specific ones. Intertask communication can occur 
via semaphores, messages, and FIFO queues. Because of the com- 
monality of the API, software developed for the RTXC single processor 
system is highly portable to the multiprocessing world of RTXC/MP. 


Aset of high-end utilities help you configure, compile and fine-tune the ap- 
plication. Both kernels use a system generation utility, RTXCgen, which 
permits interactive definition of the system components, tasks, queues, 
semaphores, memory partitions, and mailboxes. RTXCgen maintains the 


TMS3820C31 Third-Party Support 6-5 


A.T. Barrett & Associates, Inc. 


6-6 


user-defined list of all application or topology-dependeni attributes. For 
example, resizing of a memory partition requires only the regeneration of 
the C source file for memory partitions and no changes in the application 
source code. RTXCgen automatically monitors changes made to the sys- 
tem component definitions. When directed to generate C source code for 
system tables, RTXCgen also produces header files only for those system 
components that have been changed. Thus, RTXCgen promotes concor- 
dance between the source code, representing the specified components 
of the application, and the header files used for referencing members of 
that application. In addition, RTXCgen provides listings of all system com- 
ponents that serve as a primary source for system-level documentation. 


A system-level debug utility, RTXCbug, is also common to both kernels. 
RTXCbug examines the current state of the tasks, queues, and sema- 
phores and presents a coherent picture, or snapshot, of the interaction be- 
tween the system and the application tasks. It even permits manual task 
management. 


RTXC/MP includes two special utilities not found in the single processor 
RTXC kernel. RTXC monitors the system and provides, on demand, alist 
of the last 256 scheduled events, permitting you to trace the immediate 
history of the application. The second utility, a built-in work load monitor, 
acts to measure and to redistribute the workload at runtime. 


RTXC and RTXC/MP address two important problems. First, the use of 
ANSI standard C protects you from technology changes, thus preserving 
the software development investment. The easy upgrade path from a 
single processor version of RTXC to the multiple processor version of 
RTXC/MP ensures that the software investment is future proof. Second, 
the difficulties of parallel or distributed programming become less prob- 
lematic through RTXC/MP'’s use of a virtual single-processor model. The 
implementation is geared towards maximum performance so that hard 
realtime constraints are still satisfied even in a multiple-processor system 
architecture. 


A.T. Barrett & Associates, Inc. 


_] RTXC Specifics 


With an implementation history dating from 1978, RTXC provides a sound 
foundation for the solution of complex realtime systems. It is based on the 
concept of preemptive multitasking that permits a system to make efficient 
use of both time and system resources. RTXC is distributed in three 
source code configurations defined by the set of kernel services embodied 
in each. The different configurations are available to meet the real needs 
of the embedded systems marketplace where there is a wide diversity of 
functional capabilities required in a realtime kernel. RTXC allows you to 
license the source code library that most closely fits your needs. If you 
need more capabilities later on, there is a simple upgrade path. 


The three source code libraries, basic, advanced, and extended are com- 
patible with each other. All of the services in the Basic Library are included 
in the advanced library. All of the advanced library is part of the extended 
library. If you obtain a license to the basic library, you can upgrade to either 
the advanced or extended library without changing the application pro- 
grams developed with the basic library. 


(_] RTXC/MP Specifics 


The range of applications is vast, from single-processor-embedded sys- 
tems to complex control systems with various degrees of fault-tolerance 
and using tens of processors. Throughout the spectrum of applications, 
RTXC/MP provides transparent distributed realtime processing without 
the need to change any line of application source code when changing at- 
tributes of system resources (for example, the location of tasks, queues, 
semaphores, memory blocks, and priority of tasks). 


The transparency simply means that any cluster of processors can be re- 
garded as asingle realtime-processing engine. While processors give you 
scalable computing power, RTXC/MP gives you scalable realtime soft- 
ware. Transparency is achieved by the implementation of a virtual single- 
processor model. The model uses a global naming scheme in which all 
system resources are known system wide. The use of the global naming 
scheme relies on the embedded router in RTXC/MP. The RTXC/MP router 
which supports up to 64K processor nodes and 64K tasks is attractive for 
pure communication applications. The routing tables, automatically gen- 
erated by RTXCgen from the link connections table, allow you to write all 
communication between tasks as if they were located on the same proces- 
sor. Under high communication loads, prioritized handling in the router 
avoids lower priority messages blocking higher priority messages. 


While the single processor kernel (RTXC) can be used with multiple pro- 
cessors if you define your own communication protocols, the distributed 
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version frees you from this burden. Moreover, because RTXC/MP uses a 
message-based mechanism, ports to common-memory, local-memory, 
and LAN-based systems can easily be done. 


A distributed I/O library and graphics server is also available for RTXC/MP. 


The design philosophy behind RTXC/MP has proven to be a major step 
forward to shield software applications from technology changes. It offers 
a future-proof environment for the transparent development of scalable 
realtime software on scalable processor hardware. 


Biomation 


6.3 Biomation 
19050 Pruneridge Ave. 
Cupertino, CA 95014 
(800) 944-2466 
FAX: (408) 988-1647 


[_} CLAS 2000 and CLAS 4000 Logic Analyzers 


The CLAS 2000 and CLAS 4000 Logic Analyzers provide measurement 
capability for examining high-speed CISC, RISC, ASIC, and general-logic 
design including: 


96-channel module with 50/100/200 MHz capture 

Measurement widths of up to 384 channels 

Configurations with 1 to 4 logic analyzers per CLAS 4000 
Configurations with 1 or 2 logic analyzers per CLAS 2000 
Full-speed triggering with multilevel trace control 

Time-stamped transitional recording 

Disassembling of all DSP instructions 

Full-speed operation for clock and data rates 

Monitoring of every ’C3x signal with a single-probe connection 
Small interface probes for dense boards 

Reliable high-speed probing 

Timing and state measurements made through the processor probe 
Full symbolic display and triggering for address, data, and control 
groups 

Support for multiprocessor systems 


_} Operation 


Operation is quick and simple. To connect to your target, just install the 
probe board between the ’C3x CPU and its socket. Click on the icon repre- 
senting the ’C3x disassembler setup and the entire logic analyzer will be 
configured automatically. The setup assigns channels to all of the CPU’s 
signals, arranges the channels into address, data, and status groups, and 
sets up the clocking for the ‘C3x. Predefined trigger patterns are also pro- 
vided so that you can quickly specify which samples are captured. 


_] Display 
Data captured on the CLAS can be viewed simultaneously in several win- 
dows with each window displaying the data in different formats. Results 
are displayed as symbolic, hex, octal, and binary radices in a state win- 
dow; as waveforms in a timing window; and as decoded mnemonics in a 
disassembly window. Display radices can be added or changed at any 
time without taking a new measurement. 
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Decoded instructions for the TMS320C3x processor are displayed in the 
disassembly window. The MAP hardware is capable of capturing all bus 
cycles. The 'C3x must be executing out of external RAM in order for the 
disassembler to operate effectively. Four disassembly display modes are 
available: display all bus cycles, delete non-executed cycles, delete data 
read/writes, and display executed code only. These modes allow the dis- 
play to be tailored to your needs. Hardware engineers will appreciate “Dls- 
play All Bus Cycles,” while the “Display Executed Code Only” will look 
much like the program listing to which a software engineer is accustomed 
(with symbolic labels for addresses). 


Passive Interface 


Biomation uses passive interfaces in microprocessor probe adapters. 
Passive interfaces bring the processor signals directly to the logic analyz- 
er’s high-impedance data probes. Direct connection to the CPU allows 
timing measurements to be made directly through the probe. Where load- 
ing is critical, clock signals have an active buffer on the probe board to en- 
sure proper operation of the system under test. 


Specifications 


Signals Monitored: Two 96-channel pyramid measurement modules per 
CPU support full TMS320C3x disassembly. Additional pyramid modules 
can be added to monitor other system signals. 


Input Impedance: The input impedance of all signals are 1 MQ shunted 
by 8 pF except STRB, RDY, MSTRB, IOSTRB, XRDY, and H1. Input im- 
pedance on these signals are approximately 500 kQ shunted by 16 pF. 


Sampling 

External clock: DC to 50 MHz 

Internal clock: 100 ms to 5 ns 

Setup time: 7.0 ns-typical (reduced to 4 ns with timebase sync probe) 


Hold time: Ons 


Power 


All MAP poweris provided by the CLAS chassis. No power is required from 
the target system. 


Mechanical 


Connection to the target is made using a 190-pin PGA package (15 x 15 
grid) mounted on the MAP probe adapter. The probe adapter is placed be- 
tween the CPU and its socket. A zero-insertion-force (ZIF) socket is in- 
cluded, but can be removed when space is limited. 


Biomation 


(_] Probing Considerations 


The MAP probe adapter is made as small as possible to allow an easy con- 
nection when other chips are mounted next to the CPU. The probe adapter 
extends a maximum of 1.5 cm (0.6 in) from the chip on the sides and 8.6 
cm (3.4 in) along the back. 


_] Miscellaneous 


Size: Interface Box 4.0 cm (1.6 in) high, 
21.3 cm (8.4 in) wide, 
22.9 cm (9.0 in) deep 


Probe Adapter 2.1 cm (0.8 in) high (with ZIF) 
6.5 cm (2.5 in) wide 
13.7 cm (5.4 in) long 


Cable 34 cm (13.5 in) long 


Weight: 0.8 kg (1.75 Ib) with cables 
and probe adapter 


Temperature: 0-50°C, noncondensing 
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Byte-BOS 
P.O. Box 3067 


Del Mar, CA 92014 


(800) 788-7288 
(619) 755-8836 


or 


_} Byte-BOS Multitasking Operating System 


Byte-BOS Multitasking Operating System (BOS) is a low-cost, full-fea- 
tured, realtime preemptive multitasking operating system and is available 
for TMS320 DSPs. Byte-BOS brings the cost of multitasking within reach 
of all embedded software applications by providing acommon code base 
across a wide range of processors, including the TMS320C3x DSPs. BOS 
consists of a C library of realtime multitasking functions with the following 
features: 


Preemptive and nonpreemptive prioritized task scheduling 
Task control and management 

Timer management 

Event synchronization 

Message passing 

Resource management 

Serial |/O management 

Interrupt stack and nested interrupt handling 

Low power management 

Function timeout, blocking, and nonblocking return 
TMS320 on-chip timer and serial port integration 
Application code for TMS320 embedded platform 

External UART serial I/O management (add-on library) 
Fixed block memory management (add-on library) 
Multiple programmable event timers (add-on library) 
Multiple message buffers (add-on library) 

BOSVIEW realtime operating system view port (add-on library) 
Library and applications code-compiler batch and make files 
Comprehensive reference manual with many examples 
Prototype and test TMS320 BOS applications on a PC 
Source code site license (unlimited product usage) 

No royalty executable code distribution 

One year of technical support and revision updates 


BOS is optimized for all TMS320 DSPs and has excellent performance. 
BOS is configured to work with the Texas Instruments C development sys- 
tems and includes a working application. 


Computer Motion, Inc. 


6.5 Computer Motion, Inc. 
270 Storke Rd., Suite 11 
Goleta, CA 93117 
(805) 685-3729 
FAX: (805) 685-9277 


_] C++ Compiler 


Computer Motion Inc. has introduced object-oriented programming using 
C++ for the Tl TMS320C30 and TMS320C31 DSPs. This compiler is 
based on the GNU C++ retargetable compiler and executes on 
SPARCstiation platforms. This compiler translates programs directly to 
TMS320 assembly language. The Tl assembler and linker can then be 
used to create the final executable code. The object code generated from 
the assembly language output can be linked with other programs compiled 
with both the TI C compiler and the runtime-support libraries. The package 
includes documentaiton manuals and a quarter-inch cartridge tape that 
contains both a C++ and a C compiler. 
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6.6 Electronic Tools GmbH 


Zum Blauen See7 
4030 Ratingen 
Germany 
0049-2102-88010 


FAX: 0049-2102-—8801 23 


miniKit-320C31 Embedded DSP System 


miniKit-320C31 is acomplete embedded DSP system based on the Texas 
Instrument’s TMS320C31 and is not larger than the size of a credit card. 
The module addresses two significant areas of DSP-based system de- 
sign: it can either be used as a fully functional development system on 
which algorithms can be rapidly implemented and debugged or as a mod- 
ule which is easily integrated into any user’s end-system. The module is 
particularly attractive for low to medium volume embedded solutions re- 
quiring a fast turnaround time as it may be designed into any industrial 
product just like a large IC. This proven platform manufactured in SMD- 
technology offers a number of standardized intefaces which allow full ac- 
cess to all of the DSP’s features. Compatibility is guaranteed with other 
products of Electronic Tool’s miniKit range. Debugging is performed ona 
PC with the Tl db30 source-level debugger which is linked to mini- 
Kit-320C31 via a small PC controller board and the emulation port of the 
TMS320C31. A rich set of software utilities ensure that all steps from algo- 
rithm implementation in C- or assembler code right down to programming 
miniKit’s boot EPROM can be achieved on the fly. 


Credit card sized DSP system: 85mm x 61mm 

TMS320C31 (83 MHz) 

128K x 32 zero wait-state static RAM 

64K x 8 boot EPROM; booting possible via EPROM, host 

Interface, RAM or serial interface 

Watchdog timer 

Power failure detection 

Battery backup 

miniBus interface: standardized 16-bit parallel bus for attaching pe- 
ripherals 

HostBus interface: standardized 8-, 16-, 32-bit parallel interface for at- 
taching microcontrollers; also available for bit I/O 

ExpansionBus interface: TMS320C31-specific bus (32-bit parallel) for 
expanding memory and attaching peripherals 

Serial interface 

Timer interface 

Emulation interface 
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Integrated Motion, Incorporated 


Integrated Motion, Incorporated 
758 Gilman Street 

Berkeley, California 94710 

(510) 527-5810 

FAX: (510) 527-7843 


[_] MX31 Modular Embedded System 


The MX31 is a low-cost, modular, small-footprint general-purpose em- 
bedded controller with expansion daughter boards designed for applica- 
tions involving motion control. The system is based on a motherboard/ 
daughter board architecture for flexibility and low cost. The motherboard is 
a processor unit consisting of a 33-MHz TMS320C31 floating-point DSP, 
ROM, RAM, and other support devices. Each daughter board provides the 
peripherals required to control a two-axis servo-actuated mechanical sys- 
tem. Up to four daughter boards can be stacked in a single system to con- 
trol up to eight servo axes. 


HM Motherboard features 


33-MHz TMS320C31 floating-point DSP 
16- to 256K-word ROM 

Up to 256K-word zero-wait-state RAM 
RS232 serial port 

16-bit parallel I/O 


@ Daughter board features 


2-channel, 16-bit shaft encoder interface 
2-channel, 16-bit analog output 

12-bit digital input, 6-bit digital output 
Up to 32K bytes nonvolatile RAM 
All-digital I/O optically isolated 


Daughter boards for other applications such as binary image acquisi- 
tion and general-purpose I/O are currently under development. A seri- 
al port-based software monitor program is available to aid with the de- 
velopment of embedded control algorithms. 
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Figure 6—2. MX31 Fitted With a Preliminary CCD Camera Interface Daughter Board 
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The Technology Centre 
Epinal Way, Loughborough 
Leicestershire, LE11 OQE 
England 
(44) 509 231843 


(_} TMS320C31 PC/AT Embedded DSP Board 


The PC/C31 is a 3/4 length PC/AT-compatible board intended for em- 
bedded signal processing and control applications. The board’s architec- 
ture gives complete access to all of the TMS320C31’s facilities and adds a 
variety of peripheral interface options. 


The PC/C31 is ideal for a wide range of embedded applications, from real- 
time closed loop control to online signal processing. The high-perfor- 
mance, low-cost 32-bit floating-point TMS320C31’s features make it 
ideally suited to application areas not previously considered. Coupled with 
LSI’s range of peripherals, complete application systems can be as- 
sembled quickly and easily. 


Features include 


Complete TMS320C31 processing system 
Small, 3/4 length PC/AT board format 
Boot EPROM for standalone operation 
Zero wait-state SRAM up to 640K words 
Dual-port SRAM host interface 

High quality on-board analog interfaces 
Uprated DSPLINK parallel bus expansion 
Comprehensive software support 


The board format has been designed to the familiar PC/AT specification to 
ease initial evaluation and development work. Existing users of LSI 
TMS320C30 products can quickly transfer code to the PC/C31 to imple- 
ment a target system. The 3/4 length board format aids in keeping occu- 
pied space to a minimum. True standalone operation is achieved by the 
use of the boot EPROM. Using the built-in boot loader of the ’C31, the 
board can be configured to self initialize and begin execution of applica- 
tions. 


The wide range of zero-wait-state SRAM options, from 32K to 640K 
words, allows any size of system to be specifically configured for the re- 
quired application. From an intelligent microcontroller in industrial use to a 
multitasking signal processing design, all can be accommodated in a high- 
speed solution. The 2K-word dual-port memory host interface allows rapid 
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communication, allowing a host PC to transfer data to and from the 
PC/C31 without halting the DSP. This facility is a great asset in systems 
that use both the DSP and the host machine in a dual-processing arrange- 
ment, where efficient communication between the two is needed. 


The PC/C31 is fitted with two of LSI’s daughter module sites, giving it ac- 
cess to the high-quality interfaces that make up the daughter module 
range. This presently comprises both delta-sigma and successive approx- 
imation devices and is continually expanding. Using the currently avail- 
able successive approximation modules, it is possible to construct a 4-in- 
put/4-output analog system with a maximum sampling frequency of 200 
KHz on the inputs and 500 KHz on the outputs. The modules are designed 
for quality of conversion. Signal-to-noise and distortion figures of 90 dB for 
the delta-sigma part have been measured with modules mounted on DSP 
boards and placed within a PC. 


Parallel expansion is provided by an updated version of LSI’s DSPLINK 
interface standard. The bus provides a standardized interface to all of 
LSI’s DSP boards and allows the use of a range of readily available periph- 
eral boards including multichannel analog I/O and AES/EBU pro-audio 
digital interfaces. The DSPLINK specification is published, allowing users 
to easily interface a custom design to the bus. Improvements to the origi- 
nal DSPLINK include a 32-bit data bus and additional address lines. 


Code development support will be provided by the Texas Instruments 
floating-point DSP tools that include an optimizing ANSI C compiler, as- 
sembler, and linker. These tools cover the whole TI floating-point DSP 
range, making upgrades or changes to/from other devices a simple mat- 
ter. 


Debug of DSP code is supported by LSI’s command line MON31 and Win- 
dows 3.0-compatible View31. Both provide a comprehensive range of de- 
bug features. View31 allows multiple-board debug sessions, and the win- 
dows display is configurable to meet the needs of the debug session. Sev- 
eral memory areas can be viewed simultaneously while multiple-register 
windows let you view just the registers of interest. 


The LSI high-level language interface library allows the integration of the 
DSP functionality into the host PC. Functions are provided to control and 
pass data to and from the board, and the libraries are provided in both 
Microsoft and Turbo C formats. 
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6.9 Precise Software Technologies Inc. 
301 Moodie Drive, Suite 308 


Nepean, Ontario 
Canada, K2H 9C4 
(613) 596-2251 
(613) 596-6713 


Precise/MPX Realtime Multiprocessor Executive 


Realtime-embedded control applications are increasingly being solved by 
using DSPs instead of CISC-based 16- and 32-bit processors. The bene- 
fits of using DSPs are increased performance, simpler designs, and cost- 
effective multiprocessor applications. The TMS320C3x devices are cost 
effective for many embedded applications such as voice or data commu- 
nications controllers, LAN controllers, peripheral controllers, laser print- 
ers, and biomedical devices. Applications that require additional proces- 
sors to handle high throughput, high interrupt rates, or building block flexi- 
bility, can easily use 2 or more TMS320C3x DSP chips to make simple, 
easy-to-use multiprocessor systems. 


The maximum capabilities of the hardware can be realized by using the 
Precise/MPX executive. Precise/MPxX is a library of primitives that are 
used by a realtime software designer to extend the C language to a real- 
time concurrent C language with transparent support for multiprocessor 
applications. Designing applications using a concurrent programming 
model is the simplest and most natural paradigm for expressing a realtime 
problem in terms of a high-level programming language, and is the basis 
for modern programming languages such as Ada, C++, Objective-C, and 
Smalltalk. The Precise/MPX kernel has been designed such that the 
benefits of this programming paradigm can be successfully applied to real- 
time-embedded controller applications. These capabilities are provided in 
a very efficient ROMable kernel that typically requires only 16K bytes. 
Additional benefits of using Precise/MPX are 


HM Portability—the concurrent paradigm is hardware independent 

HM Reusability—task objects communicate with other task objects or 
physical interrupts via specified interfaces 

H Scalability—any application that uses Precise/MPX can be mapped 
from one to any number of DSPs without any change to the application 
software and no increase in the kernel overhead (in fact the overhead 
decreases). 
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(_} Concurrent Program Development 


Precise/MPX provides over 90 primitives to support program develop- 
ment. These can be grouped into the following major categories: 


Task management 
Inter-task communication 
Interrupt management 
Memory management 
Server management 


A software designer uses the Precise/MPX tasking model, interrupt man- 
agement primitives, and inter-task communications primitives to solve a 
realtime problem by breaking it down into concurrent tasks that communi- 
cate via well defined messages. A task is simply a C language function im- 
plemented as an iterative loop. Inter-task communication primitives pass 
messages between tasks and implicitly provide concurrency, which sim- 
plifies realtime design and implementation. 


M@ Task Management 


Precise/MPX has the capability to completely manage the state of 
tasks while an application is executing. This capability is especially im- 
portant for realtime applications that require recovery, reconfigura- 
tion, or have resource limitations. 


Application tasks are defined to the Precise/MPX kernel through a 
data structure which specifies priority, stack size, and the symbolic 
name of the first function of the task. All application tasks except for 
“main” tasks are managed explicitly by the application using the 
_Create() and _Destroy() task management primitives. 


Tasks are very lightweight. A task context is maintained in a 128-byte 
task descriptor. An application can _Create() any number and any 
type of tasks subject only to available memory. 


After system initialization, the Precise/MPX kernel will _Create() a 
user-specified “main” task and dispatch this task. The “main” task is 
written by the user to create and dispatch all remaining components 
of the realtime application. 


Once a task has been created, it will execute subject to its own priority 
and the actions it performs. Task switching occurs only when a task 
executes a Precise/MPX primitive that “readies” a higher priority task 
or when an interrupt event readies a higher priority task. 


H Inter-Task Communication 


Inter-task communication and task synchronization are supported 
with messages passed between tasks. A software designer usually 
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uses the _Send(), _Receive(), or _Reply() primitives for message 
passing. These three primitives are the core interface to the Precise/ 
MPX executive. 


The structure of a design is represented by how the application uses 
inter-task communication. _Send() is used to send a message to 
another task and cause the kernel to ready that task and run it. 
_ Receive() is used by a task to request that a message be sent to it 
and cause the kernel to ready another task. _ Reply() is used to issue 
a response from a receiving task to a sending task and to ready the 
sending task. Thus, with three simple primitives a designer can speci- 
fy all inter-task communication and all scheduling required for a con- 
current application. 


Interrupt Management 


Precise/MPX supports dynamic direct connection to interrupts. Inter- 
rupts can be either exceptions generated by the DSP or external de- 
vice interrupts. The software designer is responsible for writing the in- 
terrupt service routine, called the notifier. Notifiers can be implement- 
ed in C or in assembly language. Interrupts and notifiers can be de- 
fined during executive initialization or they can be installed by any task 
during execution. 


Notifiers are equivalent to tasks except they do not require the over- 
head of tasks and are not scheduled by the executive. A task that is 
ready to receive an interrupt uses the _Await_interrupt() primitive. A 
notifier needs only to perform two actions to reply to a waiting task. 
First, it calls _ Task_awaiting_interrupt() to determine which task is 
waiting. Then, it calls _Add_ready() which readies the waiting task. 


Memory Management 


Precise/MPX includes a dynamic memory manager that tasks use to 
allocate extra temporary or private memory areas exclusive of the 
tasks’ stack. The memory management algorithm is a first on request. 
On release, it groups together the nearest neighbors to minimize 
memory fragmentation. 


Server Management 


Precise/MPX includes primitives that support Client/Server design 
paradigms. The client/server model is a powerful design method for 
developing robust reusable applications for communications and pe- 
ripheral controllers. Clients and servers are Precise/MPX tasks. The 
only difference is that a server is created with the _Server_create() 
primitive, and after itis created, it initializes itself differently. Part of this 
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initialization is registering the Server’s service with a registry so that 
any client task can use the server. 


_} 1/0 Components 


The Precise/MPX is augmented with optional I/O software components 
that support the following services: 


mM SDLC 

mM LAPB 

M@ Mil-Std 1553 
m TCP/IP 


These components are written almost entirely in C and are completely re- 
usable for any new hardware configuration. 


Multiprocessing 


The Precise/MPX kernel has been designed to support various commonly 
used multiprocessor hardware configurations. It is a unique technology, 
due to the support for multiprocessor applications using DSPs or mixes of 
DSP and non-DSP processors. 


Precise/MPX has been successfully used on multiprocessors based upon 
VMEbus and NuBus hardware consisting of from two to 20 microproces- 
sors and using the parallel backplane as a high speed interconnection net- 
work. It has also been used in proprietary hardware applications where 
from three to nine microprocessors are interconnected with memory or 
high-speed serial data interfaces. In all cases, the applications software 
has been designed independently of the underlying hardware or intercon- 
nection network and the designer was able to reconfigure the application 
to take advantage of the number and type of processors used in the hard- 
ware without having to change the design or any applications source code. 
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FAX: (805) 683-4995 


_} SPOX Architecture 


SPOX is a highly modular and configurable runtime environment that sup- 
ports the ’C3x hardware platforms and can be integrated with application 
programs targeted for these systems. While it provides most of the func- 
tionality found in many realtime executives used with general-purpose mi- 
croprocessors, SPOX has been specifically designed for the more de- 
manding environment of TMS320C3x-based DSP systems: 


Extensive numeric computation 

Realtime I/O 

High-frequency data rates 

Limited program memory 

Multi-DSP system architectures 

Integration with an adjoining host computer 


Because of its modular software architecture, SPOX can address a wide 
range of DSP applications—telecommunications, imaging, speech and 
audio, test and measurement, and multimedia to name a few—without 
comprising system functionality and performance. The SPOX runtime en- 
vironment can be reduced to as little as a few thousand words of code for 
small embedded applications requiring only a limited number of kernel 
functions. SPOX can also be integrated into a more comprehensive envi- 
ronment that supports larger applications executing a variety of numerical- 
ly-intensive algorithms and performing system control and communica- 
tion functions. 
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Figure 6—3. SPOX Architecture 
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SPOX-DBUG 


Figure 6—3 depicts the overall architecture of SPOX, illustrating its major 
functional capabilities along with their organization into the following dis- 
tinct software components. 


M SPOX OSis the foundation of SPOX that provides a set of system ca- 
pabilities that include: memory management supplying dynamic al- 
location of arrays from multiple-memory segments; hardware inter- 
rupt handling; control of multiple-realtime tasks executing within a 
single program; and a uniform device-independent stream I/O inter- 
face to platform-specific drivers that manage peripherals used for sys- 
tem I/O and communications. It serves as the foundation for the re- 
maining application libraries and system components. 

M SPOX LIBC is a library of standard C runtime environment that pro- 
vides rudimentary file |/O capabilities on the DSP or seamless integra- 
tion with adjoining host-computer-file system. 

M SPOX MATH is acomprehensive library of optimized DSP math func- 
tions that operate on vectors, matrices, and filters. 

mM SPOX DBUG extends the capabilities of DSP C source debuggers, 
such as the Texas Instruments db30 to simplify the development of 
realtime-multitasking SPOX-OS applications. It allows developers to 
perform debug and profile functions from within the C debugger. 

mM SPOX MPis a set of software functions that provide a foundation for 
multi-DSP applications. These include interprocessor communication 
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primitives, management of shared memory, and the ability to reassign 
tasks across processor boundaries. 


_] Realtime Multitasking 


The SPOX OS offers all of the features typically found in other realtime 
multitasking kernels: 


Preemptive, event-driven scheduling 
Dynamically prioritized tasks 
Synchronization and communication facilities 
Timer services 

Handling of device interrupts 


By offering these features, SPOX OS enables realtime-multitasking ap- 
plications typically relegated to general-purpose microprocessors to 
execute on the DSP. Older configurations with 16-bit DSPs, used as slave 
processors controlled by a more intelligent, general-purpose master, can 
now be replaced by single-chip 32-bit DSP solutions. Thus, SPOX man- 
ages multiple tasks executing numerically-intensive algorithms in parallel 
with other system control and communication functions. 


(_] Memory Management, Device-Independent I/O, and Host Commu- 
nication 


While numerical processing may dominate DSP applications, memory al- 
location, I/O, and communication are equally vital when turning a theoreti- 
cal algorithm into a practical application. Where the data is located in 
memory and how this data is input or output have just as much effect on 
overall system performance as does the algorithm itself. 


Using the SPOX memory management functions, application programs 
create individual-array objects whose respective data buffers can be dy- 
namically allocated and freed during the course of execution. Unlike the 
standard C functions malloc () and free (), the SPOX array functions en- 
able the application to supply a parameter specifying the segment of 
memory in which these buffers will reside. Since production DSP hard- 
ware platforms typically contain a hierarchy of memory types (on-chip 
RAM, external SRAM, bulk DRAM, etc.) retaining explicit control over the 
location of data becomes essential to meeting realtime constraints in 
many applications. 


SPOX OS supports device-independent I/O, meaning that a uniform set of 
I/O operations are mapped into an otherwise diverse set of devices. The 
high-level nature of device-independent I/O operations provides a consis- 
tent programming interface for a number of off-the-shelf device drivers for 
accessing and controlling each device within the system and insulates ap- 
plications from the low-level details of managing these devices. 
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SPOX OS also provides a mechanism for adding platform-dependent driv- 
ers—software modules that encapsulate low-level hardware details by in- 
terpreting device-independent I/O requests in a device-dependent fash- 
ion. Device drivers are the key to customizing SPOX for a particular sys- 
tem environment, and to ensuring portability of SPOX applications from 
one system to the next. 


Unlike virtually every other operating system or realtime executive, the de- 
vice-independent I/O interface supported by SPOX does not include a 
read() Of write () function in the traditional sense. Rather than mandat- 
ing one pair of general-purpose functions for all input and output, SPOX 
allows for a broader set of I/O operations optimized for two fundamentally 
different forms of program interaction with underlying devices found in 
realtime DSP systems: 


M Asynchronous data streaming, in which the program and device are in 
a producer/consumer relationship and 


HM Synchronous message passing, in which the program and device are 
in a client/server relationship. 


C Runtime Environment 


The SPOX application libraries include many of the standard functions 
which are typically not implemented by C compilers targeted for DSP pro- 
cessors. Included among these are the routines comprising the C stdio 
library together with other standard functions requiring operating system 
support: 


Opening/closing named files (fopen, fclose, ...) 
Reading/writing byte streams (getc, pute, ...) 
Formatted I/O (printf, scanf, ...) 

Utility functions (system, time, ...) 

Program termination (exit, abort, ...) 

Memory management (malloc, free, ...) 


By furnishing these functions, the SPOX application libraries enable many 
standard C programs normally run on a host computer under UNIX or 
MS-DOS to be literally recompiled and executed faster on attached DSP 
hardware. 


DSP Math Functions 


SPOxX furnishes over 100 standard math functions that can be used as 
building blocks for algorithms employed in advanced DSP applications, 
such as: 
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H Vector functions—arithmetic and logical operations, dot product, con- 
volution, correlation, FFT, windowing, LPC analysis 

Matrix functions—arithmetic and logical operations, row and column 
manipulation, matrix multiplication, 2-D FFT 

B Filter functions—FIR, IIR, and LMS adaptive filtering 


The goal of the SPOX math library is to allow DSP application developers 
to write as much of their program in C as possible without sacrificing over- 
all system performance. To accomplish this goal, all SPOX math functions 
are optimized in assembly language. Just as importantly, they are tightly 
integrated into the SPOX memory management and I/O system so that 
critical data operated by the math algorithms is situated in the appropriate 
memory, and the overhead incurred in exchanging data between I/O 
streams and math algorithms is kept to a minimum. 


Multiprocessing Systems 


SPOX addresses the needs of multi-DSP applications with a set of func- 
tions that extend the multi-tasking, I/O, and memory management capabil- 
ities of SPOX OS from a uniprocessor to a multiprocessor architecture. In 
a SPOX multiprocessing system, a copy of SPOX-OS is required at each 
node of the system to manage load resources such as tasks and memory. 
The following independent software modules are provides: 


HM Inter-task communication application programming interface (API) 
H Multiprocessor global shared memory manager 

M@ Shared memory interprocessor resource locks 

H@ On-chip peripheral support 


Debug Support 


The C source debugger can provide the following debug and profile capa- 
bilities via additional runtime support to the SPOX OS and extensions to 
the debugger, as shown in Figure 6—4: 


Display of SPOX OS objects 

Set task-specific breakpoints 

Monitor and display system performance characteristics 
Invoke SPOX OS system calls 
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Figure 6-4. SPOX Debug Support 
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_} SPOX Products 


SPOX products that are generally used by application developers and 
system integrators include: 


Software Components for Embedded Systems 


For customers who build and develop realtime embedded DSP sys- 
tems, SPOX is offered as a suite of software components (shown in 
Figure 6—3) which can be configured and customized for the custom- 
er’s hardware. 


Application Library Packages 


All major suppliers of plug-in DSP boards offer the complete library of 
SPOX application functions for C runtime environment, realtime 
stream I/O, DSP math, and host-DSP communication. These SPOX 
application library packages are transforming PCs and workstations 
into signal-processing systems that integrate the flexibility of a host 
computer with the power of attached DSP hardware. 


SPOX Evaluation Kit 


The SPOX EVM evaluation system provides DSP system developers 
with a low-cost, easy-to-use solution for evaluating the SPOX system 
kernel ona TMS320C3x hardware platform. The SPOX EVM product 
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streamlines the evaluation process by integrating all of the necessary 
hardware and software components into a single turnkey package: 


= TMS320C3x EVM hardware platform 
= TMS320C3x C compiler and assembly language tools 
= SPOX-OS software 


SPOX EVM can also serve as a development platform for building rap- 
id prototypes of new DSP systems. All application software developed 
initially under SPOX EVM can later be reused on any production hard- 
ware platform using SPOX. 


[_} Open Signal Processing Architecture (OSPA) 


While improvements in application productivity and portability are proven 
benefits of SPOX, the true power of a standard software interface to un- 
derlying DSP hardware comes with bringing together a wide range of inter- 
operable products. With SPOX serving as the common thread, application 
developers and system integrators not only can apply these products to- 
ward solving today’s problems, but are also afforded a bridge to future 
DSP technologies through the SPOX OSPA (Open Signal Processing Ar- 
chitecture). Figure 6—5 depicts the OSPA framework for interoperability. 


Figure 6-5. Open Signal Processing Architecture 


Applications 


Laboratory Systems Audio Librar 
Design Tools Image Library 
Speech Librar 
Compilers, SPOX Telecommunications 
Debuggers Tools Library 
j Device 
Host Operating System SPOX Drivers 


H Board-Level Products 


SPOXis rapidly proliferating across a wide variety of board-level prod- 
ucts targeted for current and emerging bus architectures (VMEbus, 
NuBus, EISA, SBus, etc.), allowing developers to buy off-the-shelf 
DSP platforms and data-acquisition boards rather than building cus- 
tom hardware. 
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Program Development Tools 


SPOX supports development of application programs using high-lev- 
el languages including C and Ada. Several source-level debuggers 
are also being enhanced with knowledge of the SPOX runtime envi- 
ronment. 


DSP Function Libraries 


A growing number of vendors are offering “platform-independent 
DSP functions ranging from SPOX-compatible math libraries for au- 
dio orimage processing to complete implementations of system capa- 
bilities such as FAX/modem, speech recognition, and image com- 
pression. 


Integrated Host Applications 


To facilitate integration of host application programs with realtime DSP 
software, Spectron provides host computer software that transparent- 
ly controls and communicates with SPOX tasks executing realtime al- 
gorithms on attached DSP hardware. 
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6.11 Spectrum Signal Processing Inc. 


250 ’H’ Street 
P.O. Box 8110-25 
Blaine, WA 98230 
(800) 663-8986 


FAX: (604) 438-3046 


Lj 


DSP/PC Single-Board Computer 


Designed for applications such as multimedia, the DSP/PC single-board 
computer integrates PC technology with DSP technology on a full-size 
IBM AT plug-in card. A25-MHz 80386 provides a 100% PC/AT-compatible 
platform for running DOS programs such as Microsoft Windows, Lotus 
1-2-3, and Hypersignal Workstation, while a TMS320C31 provides up to 
33 MFLOPS of DSP power. 


Features include 


2 megabytes of System DRAM, expandable to 8 megabytes 
High-performance SCSI interface with 32-bit bus-mastering DMA 
controller 

Dual floppy disk controller 

Two serial RS-232 ports 

Parallel/printer port 

Realtime clock calendar 

Keyboard and speaker ports 

TMS320C31 32-bit floating-point DSP 

Media_Link high-speed bus expansion connector 


DSP-Link Peripherals 


Spectrum’s DSP-Link peripherals are compatible with the DSP-Link sys- 
tem expansion interface and can be connected to any DSP system or pro- 
cessor board. DSP-Link specifications are available for custom interfac- 


ing. 
Following are brief descriptions of Spectrum DSP-Link peripherals: 


M 4-Channel Analog I/O Board—Four 12-bit input channels (58 kHz/ 
channel) with quad synchronous sample-and-hold, two 12-bit output 
channels, third-order low-pass resistor-programmed filters on input 
and output, DSP-Link data transfer interface. 

HM 32-Channel Analog Input Board—32 12-bit input channels (7 kHz/ 
channel) with 4-channel synchronous sample-and-hold, 32 first-order 
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low-pass resistor-programmed input filters, 32 input buffer amplifiers, 
DSP-Link data transfer interface. 

Pro-Audio Board—AES/EBU interface, 48-/44.1-/32-kHz clock, word 
sync, DSP-Link data transfer interface. 

Pro-Audio Board—AES/EBU interface, SONY PCM interface, MIDI 
interface, 16x16 cascadeable RAM, 48-/44.1-/82-kHz clock, word 
sync, DSP-Link data transfers interface. 

DSP-Link Prototype Module—DSP-Link slave wire-wrap interface for 
easy design of custom peripherals, buffered data, decoded address, 
R/W strobes. 

DSP-Link Dual-Processor Communications Module—Allows two pro- 
cessors to communicate via DSP-Link. 
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Monroeville, PA 15146 
(412) 856-3600 
FAX: (412) 856-3636 


(_] Tartan Compilers 


Tartan, Inc. develops full-function Ada optimizing compilation systems for 
the TMS320C3x and TMS320C4x DSPs. The compiler targeted to the 
C30 has been validated by the U.S. Government’s Ada Compiler Valida- 
tion Capability under test suite version 1.11. 

Standard components of the compilation systems are: 

Highly optimizing compiler 

Ada Librarian 

Small, modular runtimes 

Standard, predefined Ada packages 

AR client package permitting access to tasking data structures and 
operations 

Intrinsics package permitting access to hardware capabilities 

Math package of elementary functions 

Cross-reference facility 

AdaScope debugger 

Linker, object librarian, and utilities 

Help facility and documentation 


The Ada compiler produces fast, compact code through Ada-specific opti- 
mizations—optimizations that take advantage of the processor’s architec- 
ture features, and a full range of classical optimizations. Five optimization 
levels permit proper optimization strategy at each point in the develop- 
ment cycle. 


Support for Ada language features include: 


M@ Representation specifications for type sizes, record layout, enumera- 
tion values, object addresses, and interrupt entries 

Unchecked deallocation and conversion 

Insertion of routines written in machine code 

All Ada predefined pragmas and the implementation-defined prag- 
mas Foreign_body and Linkage_name 


’°C8x- and ’'C40-specific features include: 


HM Access to many processor-specific native instructions 
@ Circular and bit-reversed addressing 

HM Delayed branch functionality 

HM Repeat-block and repeat-single instructions 
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Compiler switches permit generation of 16-bit PC-relative conditional call 
instructions, control of interrupt latency time using the RPTS instruction, 
and specification of the number of wait states for the memory in which the 
program is executed. 


The Tartan Ada Librarian implements the Ada language requirements for 
separate compilation and dependency control. It supports multiple li- 
braries and multiple accesses. It also permits usage of non-Ada object 
files within an Ada program. 


The Tartan linker is a fast, flexible linker for embedded Ada programs. It 
supports precise control over placement of code, data, and constants for 
individual packages, modules, sections, and subprograms in memory. It 
eliminates unused program sections from the executable program 
images, including as much of the highly modularized Tartan Ada runtimes 
as possible. An interface to the Texas Instruments TMS320C8x cross-as- 
sembler is also provided, including conversion of the output to Tartan’s ob- 
ject file format. 


The Tartan AdaScope debugger provides complete window-oriented, 
source-level, symbolic, and assembly-level debugging for Ada programs 
using Ada-like commands. It operates remotely from the host system to 
the DSP processor, using the T| XDS500 controller, or it can be run entire- 
ly on the host using the simulator. 


The Tartan Ada compilation systems can be hosted on either the Digital 
Equipment Corporation VAX series equipment running the VMS operating 
system (version 5.2 or later) or on the Sun SPARC platforms running the 
SunOS operations system (version 4.1.1 or later). 


Available options include an interface to Spectron’s SPOX-DSP vector, 
matrix, and filter math functions; TI simulator; facilities for customizing the 
runtimes; and the AdaScope retargeting kit to adapt to a different hard- 
ware configuration or communications protocol. 


Ada Compiler for the TMS320C30 


Tartan’s Ada compiler for the SMJ320C30, the military version of the 
TMS320C30, supports VAX/VMS and Sun’s SPARC systems. The com- 
piler implements Ada as defined in ANSI/MIL-STD-1815A-1983 andis val- 
idated under the latest DOD ACVC test suite 1.11. 


Tartan Inc. 


Tartan Ada C30-targeted compilation systems produce highly optimized 
application code that runs on the TMS320C30 processors. The compila- 
tion system consists of: 


H Full-function optimizing Ada compiler 

@ Tartan Ada Library that implements the Ada language requirements 
for separate compilation and dependency control 

M@ Tartan Ada Runtime System, including precompiled standard Ada 

packages for I/O and other facilities and precompiled C30-specific 

packages 

Tartan cross-reference facility, TXREF 

Tartan Ada Runtime Client Package, ARTClient, allowing on-site cus- 

tomizing of the runtime 

@ Library of elementary math and trigonometric functions that fully 

meets the specification of the SIGAda Numerics Working Group and 

the Ada-Europe Numerics Working Group 

AdaScope, the Tartan Ada source-level, symbolic debugger 

Tartan Tool Set, consisting of the Tartan Ada linker, object file librarian, 

file conversions, and other utilities 

H Online help files for the compiler and library interfaces and AdaScope 
commands 


The Ada compiler produces fast, compact code through Ada-specific op- 
timizations, optimizations that take advantage of ’C30 architecture fea- 
tures, and a full range of classical optimizations. Five optimization levels 
permit proper optimization strategy at each point in the development 
cycle. 


Code size is further reduced by Tartan’s compact, modular runtimes that 
include only the runtime functionality needed by the application in the ex- 
ecutable image. The Tartan linker reduces code size still further by elimi- 
nating unused program sections from the executable image. 


*C30-Specific Features 


Access to many ’C30 native instructions 
Circular addressing 

Bit-reversed addressing 

C30 delayed branch functionality 
Repeat-block and repeat-single instructions 


Compiler switches permit generation of 16-bit PC-relative conditional call 
instructions, control of interrupt latency time using the RPTS instruction, 
and specification of the number of wait states for the memory in which the 
program code is executed. 
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[_} Ada Language Features 


HM Representation specifications 
HM Unchecked deallocation and conversion 
Hi Insertion of routines written in machine code 


Available options include an interface to the Spectron SPOX-DSP vector, ma- 
trix, and filter math functions; TI simulator; facilities for customizing the run- 
times; and the AdaScope hardware interface. 


Figure 6—6. AdaScope Debugger Screen 
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6.13 Tektronix 
P.O. Box 500 


Tektronix 


Beaverton, OR 97077 


(800) 835-9433 


(503) 627-7111 


Tektronix offers realtime, symbolic debugging support for TMS320 develop- 
ment with their comprehensive line of logic analyzers, including the DAS9200 
and PRISM 300. Tektronix logic analyzers provide powerful fault-triggering ca- 
pabilities coupled with comprehensive mnemonic disassembly support, in- 
cluding performance, state, timing and analog analysis for hardware, soft- 
ware, and integration applications. It is ideal for the testing and debugging of 
algorithms on TMS320 hardware. See Figure 6—7. 


Lj DAS9200 


Realtime symbolic debugging 
Support of up to 5000 symbols from your compiler/assembler 
with LA-LINK 

Four disassembly display modes 
8K, 32K, 128K trace buffers 
Automatic fetch prediction 
200-Mltz state analysis 

2-GHz timing analysis 

100-MHz pattern generation 

Time correlation of up to ten DSPs 
Hard disk for storage 
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(_} PRISM 3000 


Realtime symbolic debugging 
Support of up to 1500 symbols from your compiler/assembly 
with LA-LINK 

Realtime performance analysis 
Four disassembly display modes 
Automatic fetch prediction 
200-MHz timing analysis 

Time correlation of up to four DSPs 
Choice of lab or field-portable units 
Integrated digital scope module 
Hard disk for storage 


Figure 6—7. Logic Analyzer Family 
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J] 1240/1241 Logic Analyzer 


Tektronix supports TMS320 development on their 1240/1241 Logic Ana- 
lyzer. The 1240/1241 Logic Analyzer provides complete state and timing 
analysis support for hardware, software, and integration applications. It is 
ideal for the testing and debugging of algorithms on TMS320 hardware. 
Powertul triggering, dual timebase, and mnemonic disassembly make the 
1240/1241 a valuable tool for developing processor-based products. 
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6.14 Wintriss 
4715 Viewridge, #200 
San Diego, CA 92123 
(800) 733-8089 


(_] EVB Evaluation Board 


The WECO EVB is a complete, low-cost, PC/AT TMS320 evaluation 
board. Models are available for the ’C31. 


The EVB contains a wire wrap area for system prototyping purposes and 
fullaccess by standard PC I/O functions. Dual-ported memory provides for 
convenient communications. Full debug monitor software is included for 
dynamic debugging. 


EVB features include 


1-M static RAM 

Wire wrap area 
Dual-port memory 
Dynamic debug software 
C compiler 

Up to 40-MHz operation 
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TMS320 DSP Family 


Digital signal processors are programmable microprocessors designed for 
speed and flexibility. While they provide functionality similar to traditional mi- 
croprocessors, they are distinguished by architectural differences which opti- 
mize their ability to quickly process complex mathematical formulas. 


This appendix describes the evolution of the DSP market and the role of Tl in 
this market. The TMS320 roadmap and a description of each generation of de- 
vices are also presented. 
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The DSP Market 


A.1 The DSP Market 


A-2 


Over the last decade, DSP technology has made new products possible and 
many applications affordable. In the early 1980s, DSPs provided an off-the- 
shelf alternative to custom chips and bit-slice processors. They quickly won 
acceptance in high-performance applications such as military systems. High- 
volume applications such as modems soon followed, as the cost of TI DSPs 
declined dramatically. A processor costing $500 in 1982 now costs $5 (quanti- 
ty 1)—and as little as $3 in volume. Similar price reductions will transform for- 
mer niche applications such as multimedia into a widespread standard in the 
near future. 


In addition to lower prices, improvements in ease-of-use and increased sys- 
tem integration have enabled DSPs to displace traditional microcontrollers in 
many applications. As systems become more numeric intensive, the DSP al- 
ternative is increasingly attractive. Evidence of this trend can be seen in semi- 
conductor manufacturers’ attempts to incorporate DSP-like functionality into 
traditional controllers. 


DSPs are clearly moving into the mainstream. The evidence suggests that 
DSPs will be to the 1990s what general-purpose microprocessors were to the 
1970s and 1980s. 


The TI Role in the DSP Industry 


A.2. The TI Role in the DSP Industry 


Advanced technology products and extensive development support have 
made Texas Instruments a dominant force in the DSP industry. 


Tl has played a vital role in educating new users and has made a substantial 
investment in new product development since patenting their first digital signal 
processor in 1982. In a dedicated effort to train upcoming designers in DSP 
technology, Tl provided students and professors at more than 200 universities 
with resources to study the technology and offer suggestions for improve- 
ments and new applications. University work, along with efforts of third-party 
developers, helped define new applications far beyond the niche markets of 
the early 80s. 


A broad application base led to significant cost reductions by 1987, because 
the higher volume enabled more efficiencies through mass production. Conti- 
nous advances in fabrication process technology contributed to low-cost mass 
production and enabled TI to incorporate numerous functions on a single DSP. 
As the number of functions performed by a single processor increased, prod- 
ucts could be designed to be lightweight and portable, which made the DSP 
appeal to a growing number of consumer OEMs. Texas Instruments world- 
class development support led to shorter design cycles and contributed to the 
progress in customer product technologies. The market exploded. 


Today more than 10,000 designers have gained the benefits that TMS320 
DSPs bring to applications. More than 100 independent software and hard- 
ware third parties support the development of products incorporating Tl DSPs. 
Tl also offers seminars and workshops on product applications and assists po- 
tential customers who want to incorporate DSPs in their products. 


Tl is firmly committed to the future of DSP, and will continue to develop new 
devices and applications that will drive technology into the next century. 
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A.3_ The TMS320 Product Roadmap 


A-4 


The TMS320 family of 16-/32-bit single-chip digital signal processors com- 
bines the flexibility of a high-speed controller with the numerical capability of 
an array processor, offering an inexpensive alternative to microcontrollers, 
custom VLSI, and bit-slice processors. 


The combination of the TMS320’s high degree of parallelism and its special- 
ized digital signal processing instruction set provide speed and flexibility to 
produce a CMOS microprocessor family that is capable of executing up to 50 
MFLOPS or 275 MOPS. The TMS320 family optimizes speed by implement- 
ing functions in hardware that other processors implement through software 
or microcode. This hardware-intensive approach provides the design engi- 
neer with power previously unavailable on a single chip. The newest TI gener- 
ation of floating-point DSPs — TMS320C4x — is designed for high-perfor- 
mance, parallel-processing applications. 


The TMS320 family consists of five generations (three fixed-point and two floa- 
ting-point) of digital signal processors. The fixed-point devices are members 
of the TMS320C1x, TMS320C2x, or TMS320C5x generation, and the floating- 
point devices belong to the TMS320C3x or TMS320C4x generation. 
Figure A-1 shows the TMS320 family. Table A—-1 provides a tabulated over- 
view of each member’s memory capacity, number of I/O ports (by type), cycle 
time, package type, technology, and availability. 


Many features are common among these TMS320 processors. When the term 
TMS320 is used, it refers to all five generations of DSP devices. When refer- 
ring to a specific member of the TMS320 family (e.g., TMS320C15), the name 
also implies enhanced-speed in MHz (-14, -25, etc.), erasable/programmable 
(TMS320E15), low-power (TMS320LC15), and one-time-programmable 
(TMS320P15) versions. Specific features are added to each processor to pro- 
vide different cost/performance alternatives. Software compatibility is main- 
tained throughout the family to protect your investment. Each processor has 
code-generation, system integration, and debug tools to facilitate the design 
process. 


Figure A—1. TMS320 Device Evolution 
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Table A-1. TMS320 Family Overview 
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Table A-1._ TMS320 Family Overview (Concluded) 
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Tt military version available/planned; contact nearest TI Field Sales Office for availability 

+Ser = serial; Par = parallel; DMA = direct memory access concurrent with CPU operation (Int = internal; Ext = external); Com = parallel 
communication ports 

sixteen of these parallel I/O ports are memory-mapped 

# single logical memory space for program, data, and I/O; minus on-chip RAM, peripherals, and reserved spaces 

llincludes the use of serial port timers 

* Dual buses 

§ Contains an on-chip bootloader ROM 


Note: Programmed transcoders (TMS320SS16 and TMS320SA32) are also available. 
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Table A-2. TMS320 Family Features and Benefits 
Ce a | 


Five generations of more than 25 compatible devices. DSP to meet any application 
need. 


Cycle times as fast as 35 ns. Realtime DSP performance. 
Choice of fixed-point or floating-point devices. 

Hardware multiplier and barrel shifters. 

Modified Harvard architecture. 


Concurrent DMA, program cache. 


On-chip data RAM up to 8.5K words, program ROM/EPROM Reduced system cost, space, 
up to 4K words. and power consumption. 
Serial port, timer, multiprocessor interface, instruction cache, 

DMA controller. 

CMOS processing. 


Large memory space up to 4 gigawords. Multiple DSP programs on a 
single chip. 


General-purpose and DSP-specific instructions. Ease of design. 
EPROM and OTP versions. Fast time to market. 
High-level language support. 

Operating system support. 

Extensive development support. 


JTAG IEEE test bus. System reliability. 
Serial scan path for 99% fault grading. 
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A.4_ TMS320C1x 


TMS320C 1x 


The TMS320C1x DSPs provide cost-effective solutions for many needs. 
TMS320C1x DSPs perform a multiply command at least 30 times faster than 
a general-purpose microprocessor. An on-chip hardware multiplier allows the 
TMS320C1x to produce results in a single instruction cycle. Instruction cycle 
times range from 160 to 280 ns. Higher performance is achieved through inter- 
nal parallelism and a unique Harvard architecture, which allows program fetch 
to overlap data operations. The ’C1x generation includes DSPs optimized for 
specific high-performance applications such as speech synthesis, high-speed 
modems, and telephone systems. All TMS320C1x devices are software com- 
patible for easy upgrade as application requirements change. TMS320C1x 
ROM-code versions can be used to reduce system costs. On-chip serial ports, 
companding hardware, and a coprocessor interface make the TMS320C17 
ideal for telecommunications applications. 


The TMS320C14 has been optimized for control applications such as disk 
drives and servo control. The ’C14 is the industry’s first device to combine the 
high performance of a DSP with the on-chip peripherals of a microcontroller. 
Operating at 25.6 MHz, the TMS320C14 offers five to ten times the speed of 
traditional 16-bit microcontrollers and can execute advanced control algo- 
rithms (Such as Kalman filters and state controllers) for analog-type perfor- 
mance. On-chip peripherals (such as event manager with PWM, bit I/O, watch- 
dog timer, serial port, and baud rate generator) reduce chip count, resulting in 
space and cost savings. 


With 4K words of on-chip EPROM, the TMS320E15, ’E17, and E14 support 
realtime code development. 
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A.5 TMS320C2x 


The TMS320C2x DSPs offer from two to four times the performance of the 
’C1x devices. Since the TMS320C2x devices are source-code compatible with 
TMS320C1x DSPs, they provide an ideal upgrade path for the world’s largest 
installed base of signal processors. The TMS320C2x DSPs offer instruction 
cycle times as fast as 80 ns, two to four times the amount of on-chip RAM, larg- 
er external memory reach (160K), multiprocessor capabilities, and several 
additional application-specific instructions and addressing modes. The ’E25 
offers 4K words of on-chip EPROM for realtime code development and proto- 
typing ease. The TMS320C2x ROM versions can be used for system cost re- 
duction. The ’C2x DSPs vary in instruction time and memory size and type. 
Specifically, the TMS320C25-50 supports 50-MHz (80-ns) operation. The 
TMS320C26 offers 1.5K words of on-chip data RAM, 256 words of on-chip 
ROM, and up to 128K words of data/program RAM. 


A.6 TMS320C3x 


TMS320C3x 


The TMS320C3x DSPs incorporate floating-point arithmetic and offer the fea- 
tures of a Super computer on a single chip, executing more than 33 MFLOPS. 
High performance is gained through large on-chip memories (2K words of 
RAM and 4K words of ROM), a concurrent DMA controller, and instruction 
cache (64 words). Two serial ports, two timers, a DMA controller, and large on- 
chip system memory are achieved by using a high-density CMOS process in- 
corporating 700,000 transistors. This high level of on-chip integration reduces 
system cost, space, and power requirements. Because the ’C3x devices are 
floating-point DSPs, numbers no longer need to be scaled, thereby simplifying 
code development. Future ’C3x devices will support applications needing fast- 
er cycle times, lower cost, and extreme temperature and reliability character- 
ization. TMS320C8x development is supported by high-level language com- 
pilers (C and Ada) and the SPOX realtime operating system. Scan-based 
emulation is possible through a unique on-chip serial scan path, which pro- 
vides access to all chip registers. 
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A.7 TMS320C4x 


The TMS320C4x DSPs are the world’s first floating-point DSPs designed for 
parallel processing. The ’C4x devices include 40- and 50-MHz versions. The 
’C4x CPU features a 40-/50-ns single-cycle floating-point instruction execu- 
tion time with 275/225 MOPS and 320/256 Mbytes/sec, respectively. There 
are six communication ports for direct interprocessor or processor-I/O com- 
munications peripherals. A self-programmable six-channel DMA coprocessor 
maximizes sustained CPU performance. The 512-byte instruction cache 
memory with two independent 32-bit memory interfaces support shared 
memory configurations. The ‘C4x 40-MHz version is designed for slower 
speed DSP applications that would benefit from the attributes of a lower- 
priced, floating-point TMS320C40 processor. 


A.8 TMS320C5x 


TMS320C5x 


The TMS320C5x DSPs are the industry’s highest-performance fixed-point 
DSPs. Designed to execute an instruction in 35 ns, the ’C5x is software up- 
wardly compatible with all ’C1x and ’'C2x DSPs, providing a fast performance 
upgrade path. Fast cycle times, large on-chip memories, a parallel logic unit 
(PLU), zero overhead context switching, and block repeats differentiate the 
TMS320C5x. The ’C5x has 2 serial ports which can operate in normal or time 
division multiplexed (TDM) modes. The integration of the JTAG IEEE test bus 
standard increases system reliability, allowing 99% fault grade testing and on- 
chip emulation. Spin-off devices can be developed rapidly because of the 
modular design of the ’C5x. 
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Part Ordering Information 


This chapter provides the device and support tool part numbers. Table B—1 
lists the part numbers for the TMS320C30 and TMS320C31, and Table B—2 
gives ordering information for TMS320C3x hardware and software support 
tools. An explanation of the TMS320 family device and development support 
tool prefix and suffix designators follows the two tables to assist you in under- 
standing the TMS320 product numbering system. 


The topics covered and their page numbers include: 


Topic Page 
Bil) PartiNUmbens  ciiccicc cic caccteciecaiccs ataaieieieie alain erie ltayels ecelsin/ainie ete ats B-2 
B.2 Device and Development Support Tool Prefix Designators ....... B-4 
Bi3) Device Sutfixes) ici jiecr ee terss wierecen eves are ioele eee leva steels aya everareyereunter B-5 
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Part Numbers 


B.1 Part Numbers 


Table B—1. TMS320C3x Digital Signal Processor Part Numbers 


Technology Frequency Type Dissipation 


TMS320C31PQL40 |  0.8-Lm CMOS 40 MHz Plastic 132-pinQrP | 
ipa nanencedacpgiie 1.0-4m CMOS Ceramic 181-pin PGA 

SMJ320C30HUM28 28 MHz ae Cereinietoe ain OF 

SMJ320C30HTM28 

iba nue pte 1.0-4m CMOS Ceramic 181-pin PGA 

SMJ320C30HUM25 25 MHz Se cank ieee Gee 

SMJ320C30HTM25 P 


TMS320C31PQA 0.8-LLm CMOS 33 MHz Plastic 132-pin QFP 1.00 W 


Table B-2. TMS320C3x Support Tool Part Numbers 


Tool Description Operating System Part Number 


Software 


C Compiler & Macro Assembler/ Linker VAXVMS TMDS3243255-08 
PC-DOS/MS-DOS TMDS3243855-02 
SUN UNIXt TMDS3243555-08 
MAC-MPW TMDS3243565-01 


Macro Assembler/Linker PC-DOS/MS-DOS; OS/2 TMDS3243850-02 


Simulator VAX VMS TMDS3243251-08 
PC-DOS/MS-DOS TMDS3243851-02 
SUN UNIXT TMDS3243551-09 


SPOX OS Software for ’C3x Target Board PC-DOS/MS-DOS TMDS3240132 


Tt Note that SUN UNIX supports TMS320C3x software tools on the 68000 family-based SUN-3 series workstations 
and onthe SUN-4 series machines that use the SPARC processor, but not on the SUN-386i series of workstations. 


Part Numbers 


Table B—2. TMS320C3x Support Tool Part Numbers (Concluded) 


Tool Description Operating System Part Number 
Evaluation Module (EVM) PC-DOS/MS-DOS TMDS3260030 


Tt Note that SUN UNIX supports TMS320C3x software tools on the 68000 family-based SUN-3 series worksta- 
tions and on the SUN-4 series machines that use the SPARC processor, but not on the SUN-386i series of 
workstations. 
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B.2 Device and Development Support Tool Prefix Designators 


B-4 


Prefixes to Texas Instruments’ part numbers designate phases in the product’s 
development stage for both devices and support tools, as shown in the follow- 
ing definitions: 


Device Development Evolutionary Flow: 


TMX = Experimental device that is not necessarily representative of the final 
device’s electrical specifications. 


TMP _ Final silicon die that conforms to the device’s electrical specifications 
but has not completed quality and reliability verification. 


TMS ~ Fully qualified production device. 


Support Tool Development Evolutionary Flow: 


TMDX Development support product that has not yet completed Texas In- 
struments’ internal qualification testing for development systems. 


TMDS Fully qualified development support product. 


TMX and TMP devices and TMDX development supporttools are shipped with 
the following disclaimer: 


“Developmental product is intended for internal evaluation purposes.” 


TTT 


Note: 


Texas Instruments recommends that prototype devices (TMX or TMP) not 
be used in production systems because their expected end-use failure rate 
is undefined but predicted to be greater than standard qualified production 
devices. 


| | 


TMS devices and TMDS development support tools have been fully character- 
ized, and their quality and reliability have been fully demonstrated. Texas In- 
struments’ standard warranty applies to TMS devices and TMDS development 
support tools. 


TMDX development support products are intended for internal evaluation pur- 
poses only. They are covered by Texas Instruments’ Warranty and Update 
Policy for Microprocessor Development Systems products; however, they 
should be used by customers only with the understanding that they are devel- 
opmental in nature. 
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B.3 Device Suffixes 


The suffix indicates the package type (e.g., N, FN, or GB) and temperature 
range (e.g., L). 


Figure B—1 presents a legend for reading the complete device name for any 
TMS320 family member. 


Figure B—1. TMS320 Device Nomenclature 
TS 320 C 30 GB L 


Prefix | L Temperature Range 
TMX= Experimental Device H = 0to 50°C 
TMP = Prototype Device L = Oto 70°C 
TMS= Qualified Device S = -55 to 100°C 
SMJ = MIL-STD-883C M = -55 to 125°C 
A = -40 to 85°C 
Device Family Package Type 
320 = TMS320 Family N. = Plastic DIP 
JD = Ceramic DIP Side-Brazed 
Technology FN = Plastic Leaded CC 
C = CMOS GB = Ceramic PGA 
E = CMOS EPROM FJ = Ceramic Leaded CC 
LC = Low-power CMOS FD = Leadless Ceramic CC 
P = One-time programmable FZ = Ceramic Leaded CC 
GE = Ceramic PGA, Glass Seal 
Device HU = Ceramic quad flatpack 
HT = Ceramic quad flatpack 


1st-generation DSP: 
10 
14 
15 
16 
ag 

2nd-generation DSP: 
20 
25 
26 
28 

3rd-generation DSP: 
30 
31 

4th-generation DSP: 
40 

5th-generation DSP: 
50 
51 
53 


(gull wing) 
PQ = Plastic quad flatpack 
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three operand, 2-9 


algebraic reordering, 5-4 
analysis subsystem, 5-31 
analyzer 
HP 64776, 5-31 
logic, 5-32 
systematic, 5-31 
ANSI C compiler, 5-2 
application, examples, 4-1—4-12 
application(s), software, 5-35—5-41 
architectural overview, TMS320C31, 2-1—2-32 
archiver, 5-20 
arithmetic, instruction set summary, 2-16—2-17 
arithmetic logic unit (ALU), 2-8 
assembler, TMS320, 5-19 
assembler/linker, Loughborough Sound Images Ltd., 
F-17 
assemblers 
Loughborough Sound Images Ltd., F-17 
Tartan Inc., F-33 


assembly source debugger, 5-15 
autoincrement addressing modes, 5-11 
auxiliary register ALUs, 2-8 


BBS. See Bulletin Board Service 
benefits, ’C31-based embedded system, 1-8-1-10 


Index 


Biomation, F-9-F-11 
block diagram, TMS320C31, 2-3 
Bulletin Board Service, 5-34 
bulletins, 5-33 
bus operation 

external, 2-28 

internal, 2-24 
Byte-BOS, F-12 


C compiler, TMS320, 5-2 
C source debugger, 5-15 
C/assembly source debugger. See TMS320 pro- 
grammers interface 
cache memory, 2-20 
See also memory 
central processing unit, 2-4—2-19 
code-generation tools 
assembler, 5-19 
C compiler, 5-2 
linker, 5-19 
macro assembler, 5-19 
COFF, 5-19, 5-20 
compatible devices, TMS320C3x, 1-5 
compiler 
addressing modes, 5-11 
algebraic reordering, 5-4 
branch optimizations, 5-6 
code motion, 5-7 
conditional instructions, 5-13 
constant folding, 5-4 
control-flow, 5-6 
copy propagation, 5-5 
data flow optimizations, 5-4 
delayed instructions, 5-12 
disambiguation, 5-4 
function calls, 5-7 
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compiler (continued) 
inline expansion, 5-7 
loop induction variable optimizations, 5-7 
loop rotation, 5-7 
loop unrolling, 5-14 
loop-invariant code motion, 5-7 
parallel instructions, 5-13 
redundant elimination, 5-5 
register allocation, 5-11 
register targeting, 5-10 
register tracking, 5-10 
register variables 
fixed-point, 5-10 
floating-point, 5-10 
repeat blocks, 5-11 
rotation, 5-7 
strength reduction, 5-7 
subexpression elimination, 5-5 
symbolic simplification, 5-4 
TMS320 optimizing ANSI C, 5-2 
TMS320C25, 5-1 
TMS320C26, 5-1 
TMS320C50, 5-1 
TMS320C51, 5-1 
unrolling, 5-14 
Computer Motion, Inc., F-13 
conditional instructions, 5-13 
conditional-branch addressing modes, 2-9 
constant folding, 5-4 
control, Tartan Inc., F-34 
copy propagation, 5-5 
CPU, 2-4 
CPU registers, 2-6 
auxiliary (ARO-AR7), 2-7 
block size (BK), 2-7 
data page pointer, 2-7 
extended precision (RO-R7), 2-7 
I/O flags (IOF), 2-7 
index (IR1, IRO), 2-7 
interrupt enable (IE), 2-7 
interrupt flag (IF), 2-7 
program counter (PC), 2-8, 2-24 
repeat count (RC), 2-8 
repeat end address (RE), 2-8 
repeat start address (RS), 2-8 
status register (ST), 2-7 
system stack pointer (SP), 2-7 
CPU1/2 buses, 2-24 
Customer Response Center (CRC), 5-33 
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data sheets, 5-33 
data-acquisition, equipment, 4-5 
debug and system integration tools 
analysis subsystem, 5-31 
assembly source debugger, 5-15 
C source debugger, 5-15 
debugger, 5-15 
emulators, 5-26 
evaluation module (EVM), 5-24 
HP 64776, 5-31 
simulator, 5-21 
debugger, 5-15 
display, basic, 5-15 
delayed instructions, 5-12 
design assistance, 5-37 
Details on Signal Processing, 5-34 
disambiguation, 5-4 
DMA 
architecture, 2-27 
buses, 2-24 
general, 2-27 
Doble M series system, 4-10 
Doble test, 4-9-4-12 
documentation, 5-33 
DSP 
Bulletin Board Services (BBS), 5-34 
Details on Signal Processing (newsletter), 5-34 
Hotline, 5-34 
seminars, 5-35 
DSP industry, TI role, A-3 
DSP market, A-2 


Electronic Tools GmbH, F-14 
embedded systems, 1-1 
embedded-controller, requirements, 1-2 
embedded-systems, block diagram, 2-2 
emulator 

analysis subsystem, 5-31 

HP 64776, 5-31 

scan-based, 5-26 

TMS320C3x Target Board, 5-30 

XDS, 5-27 

XDS tools, 5-26 


EPROM programmer, Loughborough Sound Images 
Ltd., F-17 
evaluation module (EVM), introduction, 5-24 


external buses (expansion, primary), 2-28 
external interrupts, 2-29 


FAX services, 5-34 
FFT, 4-6, 4-7 
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